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Preface 


Was plane geometry your favorite math course in high school? Did you like prov- 
ing theorems? Are you sick of memorizing integrals? If so, real analysis could be 
your cup of tea. In contrast to calculus and elementary algebra, it involves neither 
formula manipulation nor applications to other fields of science. None. It is pure 
mathematics, and I hope it appeals to you, the budding pure mathematician. 

This book is set out for college juniors and seniors who love math and who profit 
from pictures that illustrate the math. Rarely is a picture a proof, but I hope a good 
picture will cement your understanding of why something is true. Seeing is believing. 

Chapter 1 gets you off the ground. The whole of analysis is built on the system 
of real numbers R, and especially on its Least Upper Bound property. Unlike many 
analysis texts that assume R and its properties as axioms, Chapter 1 contains a 
natural construction of R and a natural proof of the LUB property. You will also see 
why some infinite sets are more infinite than others, and how to visualize things in 
four dimensions. 

Chapter 2 is about metric spaces, especially subsets of the plane. This chapter 
contains many pictures you have never seen, e and S will become your friends. Most 
of the presentation uses sequences and limits, in contrast to open coverings. It may 
be less elegant but it’s easier to begin with. You will get to know the Cantor set well. 

Chapter 3 is about Freshman Calculus - differentiation, integration, L’Hopital’s 
Rule, and so on, for functions of a single variable - but this time you will find out 
why what you were taught before is actually true. In particular you will see that a 
bounded function is integrable if and only if it is continuous almost everywhere, and 
how this fact explains many other things about integrals. 

Chapter 4 is about functions viewed en masse. You can treat a set of functions 
as a metric space. The “points” in the space aren’t numbers or vectors - they are 
functions. What is the distance between two functions? What should it mean that a 
sequence of functions converges to a limit function? What happens to derivatives and 
integrals when your sequence of functions converges to a limit function? When can 
you approximate a bad function with a good one? What is the best kind of function? 
What does the typical continuous function look like? (Answer: “horrible.”) 

Chapter 5 is about Sophomore Calculus - functions of several variables, partial 
derivatives, multiple integrals, and so on. Again you will see why what you were 
taught before is actually true. You will revisit Lagrange multipliers (with a picture 
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proof), the Implicit Function Theorem, etc. The main new topic for you will be 
differential forms. They are presented not as mysterious “multi-indexed expressions,” 
but rather as things that assign numbers to smooth domains. A 1-form assigns to 
a smooth curve a number, a 2-form assigns to a surface a number, a 3-form assigns 
to a solid a number, and so on. Orientation (clockwise, counterclockwise, etc.) is 
important and lets you see why cowlicks are inevitable - the Hairy Ball Theorem. 
The culmination of the differential forms business is Stokes’ Formula, which unifies 
what you know about div, grad, and curl. It also leads to a short and simple proof 
of the Brouwer Fixed Point Theorem - a fact usually considered too advanced for 
undergraduates. 

Chapter 6 is about Lebesgue measure and integration. It is not about measure 
theory in the abstract, but rather about measure theory in the plane, where you can 
see it. Surely I am not the first person to have rediscovered J.C. Burkilhs approach 
to the Lebesgue integral, but I hope you will come to value it as much as I do. After 
you understand a few nontrivial things about area in the plane, you are naturally led 
to define the integral as the area under the curve - the elementary picture you saw in 
high school calculus. Then the basic theorems of Lebesgue integration simply fall out 
from the picture. Included in the chapter is the subject of density points - points at 
which a set “clumps together.” I consider density points central to Lebesgue measure 
theory. 

At the end of each chapter are a great many exercises. Intentionally, there is no 
solution manual. You should expect to be confused and frustrated when you first 
try to solve the harder problems. Frustration is a good thing. It will strengthen you 
and it is the natural mental state of most mathematicians most of the time. Join the 
club! When you do solve a hard problem yourself or with a group of your friends, you 
will treasure it far more than something you pick up off the web. For encouragement, 
read Sam Young’s story at http://legacyrlmoore.org/reference/young.html. 


I have adopted Moe Hirsch’s star system for the exercises. One star is hard , two 
stars is very hard , and a three-star exercise is a question to which I do not know the 
answer. Likewise , starred sections are more challenging. 


Berkeley, California, USA 


Charles Chapman Pugh 
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Real Numbers 


1 Preliminaries 

Before we discuss the system of real numbers it is best to make a few general remarks 
about mathematical outlook. 


Language 

By and large, mathematics is expressed in the language of set theory. Your first 
order of business is to get familiar with its vocabulary and grammar. A set is a 
collection of elements. The elements are members of the set and are said to belong 
to the set. For example, N denotes the set of natural numbers, 1, 2, 3, ... . The 
members of N are whole numbers greater than or equal to 1. Is 10 a member of N? 
Yes, 10 belongs to N. Is 0 a member of N? No. We write 

x G A and y B 

to indicate that the element x is a member of the set A and y is not a member of B. 
Thus, 6819 G N and 0 ^ N. 

We try to write capital letters for sets and small letters for elements of sets. 
Other standard sets have standard names. The set of integers is denoted by Z, 
which stands for the German word Zahlen. (An integer is a positive whole number, 
zero, or a negative whole number.) Is \[2 G Z? No, y/2 Z. How about —15? Yes, 
-15 G Z. 
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The set of rational numbers is called Q, which stands for “quotient.” (A 
rational number is a fraction of integers, the denominator being nonzero.) Is \[2 a 
member of Q? No, \[2 does not belong to Q. Is tv a member of Q? No. Is 1.414 a 
member of Q? Yes. 

You should practice reading the notation “{ x G A :” as “the set of x that belong 
to A such that.” The empty set is the collection of no elements and is denoted by 
0 . Is 0 a member of the empty set? No, 0 ^ 0 . 

A singleton set has exactly one member. It is denoted as {x} where x is the 
member. Similarly if exactly two elements x and y belong to a set, the set is denoted 
as {x, y}. 

If A and B are sets and each member of A also belongs to B then A is a subset 
of B and A is contained in B. We write^ 

AcB. 

Is N a subset of Z? Yes. Is it a subset of Q? Yes. If A is a subset of B and B is a 
subset of £7, does it follow that A is a subset of Cl Yes. Is the empty set a subset of 
N? Yes, 0 C N. Is 1 a subset of N? No, but the singleton set { 1 } is a subset of N. 
Two sets are equal if each member of one belongs to the other. Each is a subset of 
the other. This is how you prove two sets are equal: Show that each element of the 
first belongs to the second, and each element of the second belongs to the first. 

The union of the sets A and B is the set AuB, each of whose elements belongs 
to either A, or to B, or to both A and to B. The intersection of A and B is the set 
A Pi B each of whose elements belongs to both A and to B. If A n B is the empty 
set then A and B are disjoint. The symmetric difference of A and B is the set 
AAB each of whose elements belongs to A but not to H, or belongs to B but not to 
A. The difference of A to B is the set A \ B whose elements belong to A but not 
to B. See Figure 1. 

A class is a collection of sets. The sets are members of the class. For example 
we could consider the class £ of sets of even natural numbers. Is the set { 2 , 15 } a 
member of £? No. How about the singleton set {6}? Yes. How about the empty 
set? Yes, each element of the empty set is even. 

When is one class a subclass of another? When each member of the former belongs 
also to the latter. For example the class 7 of sets of positive integers divisible by 10 

^When some mathematicians write A C B they mean that A is a subset of B , but A A B. We 
do not adopt this convention. We accept A C A. 


Section 1 


Preliminaries 


3 





Figure 1 Venn diagrams of union, intersection, and differences 


is a subclass of £, the class of sets of even natural numbers, and we write T C £. 
Each set that belongs to the class T also belongs to the class £. Consider another 
example. Let § be the class of singleton subsets of N and let 2) be the class of subsets 
of N each of which has exactly two elements. Thus {10} G § and {2,6} G 2). Is § a 
subclass of 2)? No. The members of § are singleton sets and they are not members of 
2). Rather they are subsets of members of 2). Note the distinction, and think about 
it. 


Here is an analogy. Each citizen is a member of his or her country - I am an 
element of the USA and Tony Blair is an element of the UK. Each country is a 
member of the United Nations. Are citizens members of the UN? No, countries are 
members of the UN. 


In the same vein is the concept of an equivalence relation on a set S. It is 
a relation s ^ s' that holds between some members s, s' G S and it satisfies three 
properties: For all s, V, s" G S 


(a) s ~ s. 

(b) s ^ s' implies that s' j g _ 

(c) s ~ s' ~ s" implies that s ~ s " . 


Figure 2 on the next page shows how the equivalence relation breaks S into 
disjoint subsets called equivalence classes^ defined by mutual equivalence: The 
equivalence class containing s consists of all elements s' G S equivalent to s and 
is denoted [s\. The element s is a representative of its equivalence class. Think 
again of citizens and countries. Say two citizens are equivalent if they are citizens of 
the same country. The world of equivalence relations is egalitarian: I represent my 
equivalence class USA just as much as does the president. 


^The phrase “equivalence class” is standard and widespread, although it would be more consistent 
with the idea that a class is a collection of sets to refer instead to an “equivalence set.” 
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Figure 2 Equivalence classes and representatives 


Truth 

When is a mathematical statement accepted as true? Generally, mathematicians 
would answer “Only when it has a proof inside a familiar mathematical framework.” 
A picture may be vital in getting you to believe a statement. An analogy with 
something you know to be true may help you understand it. An authoritative teacher 
may force you to parrot it. A formal proof, however, is the ultimate and only reason 
to accept a mathematical statement as true. A recent debate in Berkeley focused the 
issue for me. According to a math teacher from one of our local private high schools, 
his students found proofs in mathematics were of little value, especially compared to 
“convincing arguments.” Besides, the mathematical statements were often seen as 
obviously true and in no need of formal proof anyway. I offer you a paraphrase of 
Bob Osserman’s response. 


But a convincing argument is not a proof. A mathematician gener- 
ally wants both, and certainly would be less likely to accept a convincing 
argument by itself than a formal proof by itself. Least of all would a math- 
ematician accept the proposal that we should generally replace proofs with 
convincing arguments. 

There has been a tendency in recent years to take the notion of proof 
down from its pedestal. Critics point out that standards of rigor change 
from century to century. New gray areas appear all the time. Is a proof 
by computer an acceptable proof? Is a proof that is spread over many 
journals and thousands of pages, that is too long for any one person to 
master, a proof? And of course, venerable Euclid is full of flaws, some 
filled in by Hilbert, others possibly still lurking. 
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Clearly it is worth examining closely and critically the most basic notion 
of mathematics, that of proof. On the other hand, it is important to bear 
in mind that all distinctions and niceties about what precisely constitutes 
a proof are mere quibbles compared to the enormous gap between any 
generally accepted version of a proof and the notion of a convincing ar- 
gument. Compare Euclid, with all his flaws to the most eminent of the 
ancient exponents of the convincing argument - Aristotle. Much of Aris- 
totle’s reasoning was brilliant, and he certainly convinced most thoughtful 
people for over a thousand years. In some cases his analyses were exactly 
right, but in others, such as heavy objects falling faster than light ones, 
they turned out to be totally wrong. In contrast, there is not to my 
knowledge a single theorem stated in Euclid’s Elements that in the course 
of two thousand years turned out to be false. That is quite an aston- 
ishing record, and an extraordinary validation of proof over convincing 
argument. 


Here are some guidelines for writing a rigorous mathematical proof. See also 
Exercise 0. 


1. Name each object that appears in your proof. (For instance, you might begin 
your proof with a phrase, “Consider a set A, and elements x,y that belong to 
A,” etc.) 

2. Draw a diagram that captures how these objects relate, and extract logical 
statements from it. Quantifiers precede the objects quantified; see below. 

3. Become confident that the mathematical assertion you are trying to prove is 
really true before trying to write down a proof of it. If there a specific function 
involved - say sinaA - draw the graph of the function for a few values of a 
before starting any e, 5 analysis. Belief first and proof second. 

4. Proceed step by step, each step depending on the hypotheses, previously proved 
theorems, or previous steps in your proof. 

5. Check for “rigor”: All cases have been considered, all details have been tied 
down, and circular reasoning has been avoided. 

6. Before you sign off on the proof, check for counterexamples and any implicit 
assumptions you made that could invalidate your reasoning. 
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Logic 

Among the most frequently used logical symbols in math are the quantifiers V 
and 3. Read them always as “for each” and “there exists.” Avoid reading V as “for 
all,” which in English has a more inclusive connotation. Another common symbol is 
=>. Read it as “implies.” 

The rules of correct mathematical grammar are simple: Quantifiers appear at the 
beginning of a sentence, they modify only what follows them in the sentence, and 
assertions occur at the end of the sentence. Here is an example. 

(1) For each integer n there is a prime number p which is greater than n. 

In symbols the sentence reads 


Vn G Z 3p G P such that p > n, 

where P denotes the set of prime numbers. (A prime number is a whole number 
greater than 1 whose only divisors in N are itself and 1.) In English, the same idea 
can be reexpressed as 



Pvery integer is less than some prime number. 


or 

(3) A prime number can always be found which is bigger than any integer. 

These sentences are correct in English grammar, but disastrously WRONG when 
transcribed directly into mathematical grammar. They translate into disgusting 
mathematical gibberish: 

(WRONG (2)) Vn G Z n<p 3p G P 

(WRONG (3)) 3peP p > n Vn G Z. 

Moral Quantifiers first and assertions last. In stating a theorem, try to apply the 
same principle. Write the hypothesis first and the conclusion second. See Exercise 0. 

The order in which quantifiers appear is also important. Contrast the next two 
sentences in which we switch the position of two quantified phrases. 

(4) (Vn G N) (Vm G N) (3p G P) such that (nm < p). 


( 5 ) 


(Vn G N) (3p G P) such that (Vm G N) (nm < p). 
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(4) is a true statement but (5) is false. A quantifier modifies the part of a sentence 
that follows it but not the part that precedes it. This is another reason never to end 
with a quantifier. 

Moral Quantifier order is crucial. 

There is a point at which English and mathematical meaning diverge. It concerns 
the word “or.” In mathematics “a or 6” always means “a or b or both a and b/ while 
in English it can mean “a or b but not both a and b” For example, Patrick Henry 
certainly would not have accepted both liberty and death in response to his cry of 
“Give me liberty or give me death.” In mathematics, however, the sentence “17 is a 
prime or 23 is a prime” is correct even though both 17 and 23 are prime. Similarly, 
in mathematics a => b means that if a is true then b is true but that b might also 
be true for reasons entirely unrelated to the truth of a. In English, a => b is often 
confused with b => a. 

Moral In mathematics “or” is inclusive. It means and/or. In mathematics a b is 
not the same as b a. 

It is often useful to form the negation or logical opposite of a mathematical sen- 
tence. The symbol ~ is usually used for negation, despite the fact that the same 
symbol also indicates an equivalence relation. Mathematicians refer to this as an 
abuse of notation. Fighting a losing battle against abuse of notation, we write 
for negation. For example, if m, n G N then -i (m < n) means it is not true that m is 
less than n. In other words 


-i (m < n) = m > n. 

(We use the symbol = to indicate that the two statements are equivalent.) Similarly, 
-i (x E A) means it is not true that x belongs to A. In other words, 

-i(x G A) = x / A. 

Double negation returns a statement to its original meaning. Slightly more interesting 
is the negation of “and” and “or.” Just for now, let us use the symbols Sz for “and” 
and V for “or.” We claim 


8 


Real Numbers 


Chapter 1 


For if it is not the case that both a and b are true then at least one must be false. 
This proves (6), and (7) is similar. Implication also has such interpretations: 

(8) a b = -ia = -ia V b. 

(9) -i (a => b) = a & -ift. 

What about the negation of a quantified sentence such as 

-i(Vn G N, 3p G P such that n < p). 

The rule is: change each V to 3 and vice versa, leaving the order the same, and negate 
the assertion. In this case the negation is 

3 n G N, \/p G P, n > p. 

In English it reads “There exists a natural number n, and for all primes p we have 
n > p.” The sentence has correct mathematical grammar but of course is false. To 
help translate from mathematics to readable English, a comma can be read as “and,” 
“we have,” or “such that.” 

All mathematical assertions take an implication form a => b. The hypothesis is 
a and the conclusion is b. If you are asked to prove a => 6, there are several ways 
to proceed. First you may just see right away why a does imply b. Fine, if you are 
so lucky. Or you may be puzzled. Does a really imply 6? Two routes are open to 
you. You may view the implication in its equivalent contrapositive form -i a -i b as 
in (8). Sometimes this will make things clearer. Or you may explore the possibility 
that a fails to imply b. If you can somehow deduce from the failure of a implying b 
a contradiction to a known fact (for instance, if you can deduce the existence of a 
planar right triangle with legs x,y but x 1 + y 2 ^ h 2 , where h is the hypotenuse), 
then you have succeeded in making an argument by contradiction. Clearly (9) is 
pertinent here. It tells you what it means that a fails to imply 6, namely that a is 
true and simultaneously b is false. 

Euclid’s proof that N contains infinitely many prime numbers is a classic example 
of this method. The hypothesis is that N is the set of natural numbers and that P 
is the set of prime numbers. The conclusion is that P is an infinite set. The proof of 
this fact begins with the phrase “Suppose not.” It means to suppose, after all, that 
the set of prime numbers P is merely a finite set, and see where this leads you. It 
does not mean that we think P really is a finite set, and it is not a hypothesis of a 
theorem. Rather it just means that we will try to find out what awful consequences 
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would follow from P being finite. In fact if P were^ finite then it would consist of m 

numbers pi, . . . , p m . Their product N = 2-3-5 p m would be evenly divisible 

(i.e. , remainder 0 after division) by each pi and therefore N + 1 would be evenly 
divisible by no prime (the remainder of pi divided into N + 1 would always be 1), 
which would contradict the fact that every integer > 2 can be factored as a product 
of primes. (The latter fact has nothing to do with P being finite or not.) Since the 
supposition that P is finite led to a contradiction of a known fact, prime factorization, 
the supposition was incorrect, and P is, after all, infinite. 

Aficionados of logic will note our heavy use here of the “law of the excluded 
middle,” to wit, that a mathematically meaningful statement is either true or false. 
The possibilities that it is neither true nor false, or that it is both true and false, are 
excluded. 

Notation The symbol ^ indicates a contradiction. It is used when writing a proof 
in longhand. 


Metaphor and Analogy 

In high school English, you are taught that a metaphor is a figure of speech in 
which one idea or word is substituted for another to suggest a likeness or similarity. 
This can occur very simply as in “The ship plows the sea.” Or it can be less direct, 
as in “His lawyers dropped the ball.” What give a metaphor its power and pleasure 
are the secondary suggestions of similarity. Not only did the lawyers make a mistake, 
but it was their own fault, and, like an athlete who has dropped a ball, they could 
not follow through with their next legal action. A secondary implication is that their 
enterprise was just a game. 

Often a metaphor associates something abstract to something concrete, as “Life 
is a journey.” The preservation of inference from the concrete to the abstract in this 
metaphor suggests that like a journey, life has a beginning and an end, it progresses 
in one direction, it may have stops and detours, ups and downs, etc. The beauty of 
a metaphor is that hidden in a simple sentence like “Life is a journey” lurk a great 
many parallels, waiting to be uncovered by the thoughtful mind. 

Tn English grammar, the subjunctive mode indicates doubt, and I have written Euclid’s proof in 
that form - “if P were finite” instead of “if P is finite,” “each prime would divide N evenly,” instead 
of “each prime divides N evenly,” etc. At first it seems like a fine idea to write ah arguments by 
contradiction in the subjunctive mode, clearly exhibiting their impermanence. Soon, however, the 
subjunctive and conditional language becomes ridiculously stilted and archaic. For consistency then, 
as much as possible, use the present tense. 
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Metaphorical thinking pervades mathematics to a remarkable degree. It is of- 
ten reflected in the language mathematicians choose to define new concepts. In his 
construction of the system of real numbers, Dedekind could have referred to A\B as 
a “type-2, order preserving equivalence class,” or worse, whereas “cut” is the right 
metaphor. It corresponds closely to one’s physical intuition about the real line. See 
Figure 3. In his book, Where Mathematics Comes From , George Lakoff gives a com- 
prehensive view of metaphor in mathematics. 

An analogy is a shallow form of metaphor. It just asserts that two things are 
similar. Although simple, analogies can be a great help in accepting abstract concepts. 
When you travel from home to school, at first you are closer to home, and then you 
are closer to school. Somewhere there is a halfway stage in your journey. You know 
this, long before you study mathematics. So when a curve connects two points in 
a metric space (Chapter 2), you should expect that as a point “travels along the 
curve,” somewhere it will be equidistant between the curve’s endpoints. Reasoning 
by analogy is also referred to as “intuitive reasoning.” 

Moral Try to translate what you know of the real world to guess what is true in 
mathematics. 


Two Pieces of Advice 

A colleague of mine regularly gives his students an excellent piece of advice. When 
you confront a general problem and do not see how to solve it, make some extra 
hypotheses, and try to solve it then. If the problem is posed in n dimensions, try 
it first in two dimensions. If the problem assumes that some function is continuous, 
does it get easier for a differentiable function? The idea is to reduce an abstract 
problem to its simplest concrete manifestation, rather like a metaphor in reverse. At 
the minimum, look for at least one instance in which you can solve the problem, and 
build from there. 

Moral If you do not see how to solve a problem in complete generality, first solve it 
in some special cases. 


Here is the second piece of advice. Buy a notebook. In it keep a diary of your 
own opinions about the mathematics you are learning. Draw a picture to illustrate 
every definition, concept, and theorem. 
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2 Cuts 

We begin at the beginning and discuss R = the system of all real numbers from a 
somewhat theological point of view. The current mathematics teaching trend treats 
the real number system R as a given - it is defined axiomatically. Ten or so of 
its properties are listed, called axioms of a complete ordered field, and the game 
becomes to deduce its other properties from the axioms. This is something of a 
fraud, considering that the entire structure of analysis is built on the real number 
system. For what if a system satisfying the axioms failed to exist? Then one would 
be studying the empty set! However, you need not take the existence of the real 
numbers on faith alone - we will give a concise mathematical proof of it. 

It is reasonable to accept all grammar school arithmetic facts about 

The set N of natural numbers, 1, 2, 3, 4, . . .. 

The set Z of integers, 0, 1, —1, —2, 2, . . . . 

The set Q of rational numbers p/q where p, q are integers, q ^ 0. 

For example, we will admit without question facts like 2 + 2 = 4, and laws like 
a-\-b — 6 + a for rational numbers a, b . All facts you know about arithmetic involving 
integers or rational numbers are fair to use in homework exercises too.^ It is clear 
that N C Z C Q. Now Z improves N because it contains negatives and Q improves 
Z because it contains reciprocals. Z legalizes subtraction and Q legalizes division. 
Still, Q needs further improvement. It doesn’t admit irrational roots such as or 
transcendental numbers such as i r. We aim to go a step beyond Q, completing it to 
form R so that 

N c Z c Q C R. 

As an example of the fact that Q is incomplete we have 
1 Theorem No number r in Q has square equal to 2; i.e., \/2 ^ Q. 

Proof To prove that every r — p/q has r 2 ^ 2 we show that p 2 ^ 2 q 2 . It is fair to 
assume that p and q have no common factors since we would have canceled them out 
beforehand. 

Case 1. p is odd. Then p 2 is odd while 2 q 2 is not. Therefore p 2 ^ 2 q 2 . 

subtler fact that you may find useful is the prime factorization theorem mentioned above. Any 
integer > 2 can be factored into a product of prime numbers. For example, 120 is the product of 
primes 2 • 2 • 2 • 3 • 5. Prime factorization is unique except for the order in which the factors appear. 
An easy consequence is that if a prime number p divides an integer k and if k is the product mn of 
integers then p divides m or it divides n. After all, by uniqueness, the prime factorization of k is just 
the product of the prime factorizations of m and n. 
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Case 2. y is even. Since p and q have no common factors, q is odd. Then p 2 is divisible 
by 4 while 2 q 2 is not. Therefore p 2 ^ 2 q 2 . 


Since p 2 ^ 2 q 2 for all integers p, there is no rational number r — p/q whose square 


is 2. 


□ 


The set Q of rational numbers is incomplete. It has “gaps,” one of which occurs at 
\[2. These gaps are really more like pinholes; they have zero width. Incompleteness 
is what is wrong with Q. Our goal is to complete Q by filling in its gaps. An elegant 
method to arrive at this goal is Dedekind cuts in which one visualizes real numbers 
as places at which a line may be cut with scissors. See Figure 3. 




Figure 3 A Dedekind cut 

Definition A cut in Q is a pair of subsets A, B of Q such that 

(a) A U B = Q, A / 0, B / 0, A n B = 0. 

(b) If a G A and b G B then a < b. 

(c) A contains no largest element. 


A is the left-hand part of the cut and B is the right-hand part. We denote the 
cut as x = A\B. Making a semantic leap, we now answer the question “what is a real 
number?” 

Definition A real number is a cut in 

R is the class^ of all real numbers x — A\B. We will show that in a natural way R 
is a complete ordered field containing Q. Before spelling out what this means, here 
are two examples of cuts. 


Mhe word “class” is used instead of the word “set” to emphasize that for now the members of M 
are set-pairs A\B, and not the numbers that belong to A or B. The notation A\B could be shortened 
to A since B is just the rest of Q. We write A\B, however, as a mnemonic device. It looks like a cut. 
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(i) A\B — {r G Q : r < 1} | {r G Q : r > 1}. 

(ii) A\B — {r G Q : r < 0 or r 2 < 2} | {r G Q : r > 0 and r 2 > 2}. 


It is convenient to say that A\B is a rational cut if it is like the cut in (i): For 
some fixed rational number c, A is the set of all rationals < c while B is the rest of Q. 
The F>-set of a rational cut contains a smallest element c, and conversely, if is a 
cut in Q and B contains a smallest element c then A\B is the rational cut at c. We 
write c* for the rational cut at c. This lets us think of Q C R by identifying c with 
c* It is like thinking of Z as a subset of Q since the integer n in Z can be thought of 
as the fraction n/1 in Q. In the same way the rational number c in Q can be thought 
of as the cut at c. It is just a different way of looking at c. It is in this sense that we 
write 


N C Z C 


c R. 


There is an order relation x < y on cuts that fairly cries out for attention. 


Definition If x — A\B and y — C\D are cuts such that A C C then x is less than 
or equal to y and we write x < y. If A C C and A ^ C then x is less than y and 
we write x < y. 


The property distinguishing R from Q and which is at the bottom of every signifi- 
cant theorem about R involves upper bounds and least upper bounds or, equivalently, 
lower bounds and greatest lower bounds. 

M G R is an upper bound for a set S C R if each s G S satisfies 

s < M. 

We also say that the set S is bounded above by M. An upper bound for S that 
is less than all other upper bounds for S' is a least upper bound for S. The least 
upper bound for S is denoted l.u.b.(S). For example, 

3 is an upper bound for the set of negative integers. 

— 1 is the least upper bound for the set of negative integers. 

1 is the least upper bound for the set of rational numbers 1 — 1 jn with n G N. 

— 100 is an upper bound for the empty set. 

A least upper bound for S may or may not belong to S. This is why you should say 
“least upper bound for S” rather than “least upper bound of S.” 


14 


Real Numbers 


Chapter 1 


2 Theorem The set R ; constructed by means of Dedekind cuts, is complete t in the 
sense that it satisfies the 

Least Upper Bound Property: If S is a nonempty subset of R 
and is bounded above then in R there exists a least upper bound for S. 

Proof Easy! Let ScRbe any nonempty collection of cuts which is bounded above, 
say by the cut X\Y. Define 


C — {a G Q : for some cutA|L> G 6 we have a G A} and D — the rest of 


It is easy to see that z = C\D is a cut. Clearly, it is an upper bound for 6 since the 
A for every element of 6 is contained in C. Let z' = C'\D' be any upper bound for 
6. By the assumption that A\B < C'\D' for all A\B G 6, we see that the A for every 
member of 6 is contained in C’ . Hence C C C’ , so z < z' . That is, among all upper 
bounds for 6, z is least. □ 


The simplicity of this proof is what makes cuts good. We go from Q to R by 
pure thought. To be more complete, as it were, we describe the natural arithmetic 
of cuts. Let cuts x — A\B and y — C\D be given. How do we add them? subtract 
them? . . . Generally the answer is to do the corresponding operation to the elements 
comprising the two halves of the cuts, being careful about negative numbers. The 
sum of x and y is x + y — E\F where 

E — {r G Q : for some a G A and for some cGC we have r = a + c} 

F — the rest of 


It is easy to see that E\F is a cut in Q and that it doesn’t depend on the order in 
which x and y appear. That is, cut addition is well defined and x + y — y + x. The 
zero cut is 0* and 0* + x — x for all x G R. The additive inverse of x — A\B is 
— x — C\D where 

C — {r G Q : for some b G B, not the smallest element of B, r = — b} 

D — the rest of 


Then ( —x) + x = 0*. Correspondingly, the difference of cuts is x — y — x + i~y ) 
Another property of cut addition is associativity: 

(. x + y) + z = x+(y + z). 


^ There is another, related, sense in which R is complete. See Theorem 5 below. 
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This follows from the corresponding property of 

Multiplication is trickier to define. It helps to first say that the cut x — A\B is 
positive if 0* < x or negative if x < 0* Since 0 lies in d or a cut is either 
positive, negative, or zero. If x — A\B and y — C\D are positive cuts then their 
product is x • y — E\F where 

£={rEQ:r<0or3aGd and 3 c G C such that a > 0, c > 0, and r — ac} 

and F is the rest of Q. If x is positive and y is negative then we define the product 
to be —{x • (— y)). Since x and — y are both positive cuts this makes sense and is 
a negative cut. Similarly, if x is negative and y is positive then by definition their 
product is the negative cut — ((— x) • y), while if x and y are both negative then their 
product is the positive cut (—x) • (— y). Finally, if x or y is the zero cut 0* we define 
x ' y to be 0*. (This makes five cases in the definition.) 

Verifying the arithmetic properties for multiplication is tedious, to say the least, 
and somehow nothing seems to be gained by writing out every detail. (To pursue 
cut arithmetic further you could read Landau’s classically boring book, Foundations 
of Analysis.) To get the flavor of it, let’s check the commutativity of multiplication: 
x • y — y • x for cuts x — A\B , y — C\D. If x, y are positive then 

{ac : a G A, c G C, a > 0, c > 0} = {ca : c G C, a G A, c > 0, a > 0} 
implies that x • y — y • x. If x is positive and y is negative then 

x-y = -(x-(-y)) = -((-y)-x) = y ■ x. 


The second equality holds because we have already checked commutativity for positive 
cuts. The remaining three cases are checked similarly. There are twenty seven cases 
to check for associativity and twenty seven more for distributivity. All are simple 
and we omit their proofs. The real point is that cut arithmetic can be defined and it 
satisfies the same held properties that Q does: 


The operation of cut addition is 
well defined , natural , commutative, associative, and 
has inverses with respect to the neutral element 0*. 

The operation of cut multiplication 
is well defined, natural, commutative, associative , 
distributive over cut addition, and has inverses of 
nonzero elements with respect to the neutral element 1*. 
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By definition, a field is a system consisting of a set of elements and two oper- 
ations, addition and multiplication, that have the preceding algebraic properties - 
commutativity, associativity, etc. Besides just existing, cut arithmetic is consistent 
with Q arithmetic in the sense that if c, r G Q then 

c + r — (c + r) and c • r — { cr ) . 

By definition, this is what we mean when we say that Q is a subfield of R. The cut 
order enjoys the additional properties of 

transitivity x < y < z implies x < z. 

trichotomy Either x < y, y < x, or x = y, but only one of the three things 
is true. 

translation x < y implies x + z < y + z. 

By definition, this is what we mean when we say that R is an ordered field. 
Besides, the product of positive cuts is positive and cut order is consistent with Q 
order: c* < r* if and only if c < r in Q. By definition, this is what we mean when we 
say that Q is an ordered subfield of R. To summarize 

3 Theorem The set R of all cuts in Q is a complete ordered field that contains Q 
as an ordered subfield. 


The magnitude or absolute value of x G R is 


x 


x 

—x 


if x > 0 
if x < 0. 


Thus, x <\x\. A basic, constantly used fact about magnitude is the following. 

4 Triangle Inequality For all x,y G R we have \x + y\ < \x\ + \y\. 

Proof The translation and transitivity properties of the order relation imply that 
adding y and —y to the inequalities x < \x\ and —x < x\ gives 


x + y < 

X 

+ y < 

X 

+ 

y 

x — y < 

X 

VI 

1 

X 

+ 

y 


Since 


x + y 


x + y ifx + y>0 
—x — y ifx + y<0 


and both x + y and —x — y are less than or equal to 


x 


x 


+ \y\ as asserted. 


+ |y|, we infer that \x + y\ < 

□ 
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Next, suppose we try the same cut construction in R that we did in Q. Are there 
gaps in R that can be detected by cutting R with scissors? The natural definition of 
a cut in R is a division A|23, where A and 23 are disjoint, nonempty subcollections of 
R with A U 23 = R, and a < b for all a E A and 6 E 23. Further, A contains no largest 
element. Each b E 23 is an upper bound for A. Therefore y — l.u.b.(/l) exists and 
a < y < b for all a E A and b E 23. By trichotomy, 


a 


ax r 




In other words, R has no gaps. Every cut in R occurs exactly at a real number. 


Allied to the existence of R is its uniqueness. Any complete ordered held F 
containing Q as an ordered subheld corresponds to R in a way preserving all the 
ordered held structure. To see this, take any ip E F and associate to it the cut A\B 
where 

d = {r E Q : r < (^ in F} B — the rest of 
This correspondence makes F equivalent to R. 


Upshot The real number system R exists and it satishes the properties of a complete 
ordered held. The properties are not assumed as axioms, but are proved by logically 
analyzing the Dedekind construction of R. Having gone through all this cut rigmarole, 
we must remark that it is a rare working mathematician who actually thinks of R as 
a complete ordered held or as the set of all cuts in Q. Rather, he or she thinks of R 
as points on the x-axis, just as in calculus. You too should picture R this way, the 
only beneht of the cut derivation being that you should now unhesitatingly accept 
the least upper bound property of R as a true fact. 


Note Too are not real numbers, since Q|0 and 0|Q are not cuts. Although some 
mathematicians think of R together with — oo and +oo as an “extended real number 
system,” it is simpler to leave well enough alone and just deal with R itself. Nev- 
ertheless, it is convenient to write expressions like “x — > oo” to indicate that a real 
variable x grows larger and larger without bound. 


If S is a nonempty subset of R then its supremum is its least upper bound when 
S is bounded above and is said to be Too otherwise; its infimum is its greatest lower 
bound when S is bounded below and is said to be — oo otherwise. (In Exercise 19 you 
are asked to invent the notion of greatest lower bound.) By definition the supremum 
of the empty set is — oo. This is reasonable, considering that every real number, no 
matter how negative, is an upper bound for 0, and the least upper bound should be 
as far leftward as possible, namely — oo. Similarly, the infimum of the empty set is 
Too. We write sup S' and inf S for the supremum and infimum of S. 
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Cauchy sequences 

As mentioned above there is a second sense in which R is complete. It involves the 
concept of convergent sequences. Let ai, < 22 , < 23 , < 24 , . . . = (a n ), n G N, be a sequence 
of real numbers. The sequence (a n ) converges to the limit b G R as n 00 
provided that for each e > 0 there exists IV G N such that for all n > N we have 


a 


n 


— b < e. 


The statistician’s language is evocative here. Think of n = 1, 2, ... as a sequence of 
times and say that the sequence ( a n ) converges to b provided that eventually all its 
terms nearly equal b. In symbols, 


Ve > 0 3 A G N such that n > N 


a n — b < e. 


If the limit b exists it is not hard to see (Exercise 20) that it is unique, and we write 


lim a n — b or a n b. 

n— ^ 00 

Suppose that lim a n — b. Since all the numbers a n are eventually near b they are 

n— ^ 00 

all near each other; i.e. , every convergent sequence obeys a Cauchy condition: 


V e > 0 3 A G N such that if n, k > N then 


a n CLfc 


< e. 


The converse of this fact is a fundamental property of R. 

5 Theorem R is complete with respect to Cauchy sequences in the sense that if 
(a n ) is a sequence of real numbers which obeys a Cauchy condition then it converges 
to a limit in R. 


Proof First we show that ( a n ) is bounded. Taking e = 1 in the Cauchy condition 
implies there is an N such that for all n,k > N we have | a n — a^\ < 1. Take K large 
enough that — K < a\, . . . , ajy < K. Set M — K + 1. Then for all n we have 


-AI < a n < M 


which shows that the sequence is bounded. 
Define a set X as 


A = { x G R : 3 infinitely many n such that a n > x}. 

—M G X since for all n we have a n > — M , while M ^ X since no x n is > M . Thus 
A is a nonempty subset of R which is bounded above by M . The least upper bound 
property applies to A and we have b — 1. u. b. A with — M < b < M . 
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We claim that a n converges to b as n — > oo. Given e > 0 we must show there is 
an X such that for all n > X we have | a n — b \ < e. Since (a n ) is Cauchy and e/2 is 
positive there does exist an X such that if n, k > X then 


dn a k 


e 

<2- 


Since b — e/2 is less than b it is not an upper bound for X, so there is x G X with 
b — e/2 < x. For infinitely many n we have a n > x. Since 6 + e/2 > 6 it does not 
belong to X, and therefore for only finitely many n do we have a n > b + e/2. Thus, 
for infinitely many n we have 


. e e 

b < x < a„ < b H — . 

2 - - n - 2 


Since there are infinitely many of these n there are infinitely many that are > N. 
Pick one, say a no with no > X and b — e/2 < a no < b + e/2. Then for all n > N we 
have 


^ n ^ 


a n — a 


no 


+ 


a m ~ b\ < - + - — 


which completes the verification that (a n ) converges. See Figure 4. 


□ 


a 


n 


a 


n o 


-M 


b — e/2 


6 + e/2 


M 


Figure 4 For all n > X we have I a n — b | < e. 


Restating Theorem 5 gives the 

6 Cauchy Convergence Criterion A sequence (a n ) in R converges if and only if 


Ve > 0 3 X G N such that n,k > X 


a n a k 


< e. 


Further description of R 


The elements of R \ Q are irrational numbers. If x is irrational and r is rational 
then y = x + r is irrational. For if y is rational then so is y — r — x, the difference of 
rationals being rational. Similarly, if r ^ 0 then rx is irrational. It follows that the 
reciprocal of an irrational number is irrational. From these observations we will show 
that the rational and irrational numbers are thoroughly mixed up with each other. 


Let a < b be given in R. Define the intervals (a, b ) and [a, b] as 


(a, b) — {x G R : a < x < b} 
a, b\ — {x G R : a < x < b}. 
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7 Theorem Every interval (a, 6) ; no matter how small , contains both rational and 
irrational numbers. In fact it contains infinitely many rational numbers and infinitely 
many irrational numbers. 


Proof Think of a, b as cuts a = AIA 7 , b — B\B r . The fact that a < b implies the set 
B \ A is a nonempty set of rational numbers. Choose a rational r E B \ A. Since B 
has no largest element, there is a rational s with a < r < s < b. Now consider the 
transformation 

T : t i— > r + (s — r)t. 

It sends the interval [0, 1] to the interval [r, s\. Since r and s — r are rational, T sends 
rationals to rationals and irrationals to irrationals. Clearly [0, 1] contains infinitely 
many rationals, say 1 jn with n E N, so [r, s ] contains infinitely many rationals. Also 
[0,1] contains infinitely many irrationals, say l/n\/2 with n E N, so [r, s] contains 
infinitely many irrationals. Since [r, s] contains infinitely many rationals and infinitely 
many irrationals, the same is true of the larger interval (a, b). □ 


Theorem 7 expresses the fact that between any two rational numbers lies an irra- 
tional number, and between any two irrational numbers lies a rational number. This 
is a fact worth thinking about for it seems implausible at first. Spend some time 
trying to picture the situation, especially in light of the following related facts: 

(a) There is no first (i.e., smallest) rational number in the interval (0, 1). 

(b) There is no first irrational number in the interval (0, 1). 

(c) There are strictly more irrational numbers in the interval (0, 1) (in the cardi- 
nality sense explained in Section 4) than there are rational numbers. 

The transformation in the proof of Theorem 7 shows that the real line is like 
rubber: stretch it out and it never breaks. 

A somewhat obscure and trivial fact about R is its Archimedean property: for 
each x E R there is an integer n that is greater than x. In other words, there exist 
arbitrarily large integers. The Archimedean property is true for Q since p/q < \p\. It 
follows that it is true for R. Given x — A\B , just choose a rational number r E B 
and an integer n > r. Then n > x. An equivalent way to state the Archimedean 
property is that there exist arbitrarily small reciprocals of integers. 

Mildly interesting is the existence of ordered fields for which the Archimedean 
property fails. One example is the field R(x) of rational functions with real coeffi- 
cients. Each such function is of the form 

p(x) 

q{x ) 


R(x) 
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where p and q are polynomials with real coefficients and q is not the zero polynomial. 
(It does not matter that q{x) — 0 at a finite number of points.) Addition and 
multiplication are defined in the usual fashion of high school algebra, and it is easy to 
see that R(x) is a field. The order relation on R(x) is also easy to define. If R(x) > 0 
for all sufficiently large x then we say that R is positive in R(x), and if R — S is 
positive then we write S < R. Since a nonzero rational function vanishes (has value 
zero) at only finitely many x G R, we get trichotomy: either R — S', R < S', or S < R. 
(To be rigorous, we need to prove that the values of a rational function do not change 
sign for x large enough.) The other order properties are equally easy to check, and 
R(x) is an ordered held. 

Is R(x) Archimedean? That is, given R G R(x), does there exist a natural number 
n G R(x) such that R < n? (A number n is the rational function whose numerator is 
the constant polynomial p{x) — n, a polynomial of degree zero, and whose denomina- 
tor is the constant polynomial q(x) = 1.) The answer is “no.” Take R(x) — x/1. The 
numerator is x and the denominator is 1. Clearly we have n < x, not the opposite, 
so R(x) fails to be Archimedean. 

The same remarks hold for any positive rational function R — p ( x ) / q ( x ) where 
the degree of p exceeds the degree of q. In R(x), R is never less than a natural 
number. (You might ask yourself: exactly which rational functions are less than n?) 


The e-principle 

Finally let us note a nearly trivial principle that turns out to be invaluable in 
deriving inequalities and equalities in R. 


8 Theorem (e-principle) If a, b are real numbers and if for each e > 0 we have 
a < b - be then a < b. If x,y are real numbers and for each e > 0 we have \x — y\ < e 
then x — y. 


Proof Trichotomy implies that either a < b or a > b. In the latter case we can 
choose e with 0 < e < a — b and get the absurdity 

e < a — b < e. 


Hence a < b. Similarly, if x ^ y then choosing e with 0 < e < 
contradiction e < \x — y\ < e. Hence x — y. See also Exercise 12. 


x - y 


gives the 
□ 
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3 Euclidean Space 

Given sets A and B , the Cartesian product of A and B is the set A x B of all 
ordered pairs (a, b) such that a G A and b G B. (The name comes from Descartes 
who pioneered the idea of the xy-coordinate system in geometry.) See Figure 5. 






(a, b) 




AxB 





A a 


Figure 5 The Cartesian product A x B 

The Cartesian product of R with itself m times is denoted R m . Elements of R m 
are vectors, ordered m-tuples of real numbers (xi, . . . , x m ). In this terminology real 
numbers are called scalars and R is called the scalar held. When vectors are added, 
subtracted, and multiplied by scalars according to the rules 

(xi , . . . , XfYi ) + (yi, • • • ,y m ) = Oi + yi , . . . , Xfji T Vm ) 

(xi, • • • 5 Xjyi ) (^/l 5 • • • 5 Vm) (^1 2/1 5***5 2/m) 

c(xi, • • • , X 77 7 ,) — (cxi, • • • , CXjyi) 

then these operations obey the natural laws of linear algebra: commutativity, as- 
sociativity, etc. There is another operation defined on R m , the dot product (also 
called the scalar product or inner product). The dot product of x = (xi, . . . , x m ) and 

y (yi, • • • ,ym) is 

y) ~ ^iyi T • • • T x rn y rn . 

Remember: the dot product of two vectors is a scalar, not a vector. The dot product 
operation is bilinear, symmetric, and positive definite; i.e., for any vectors x, y, z G R m 
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and any c G R we have 

(x,y) + c(x,z) 

(y,x) 

0 and ( 0 if and only if x is the zero vector. 


(. x , y + cz) = 

(x,y) = 

(x,x) > 


The length or magnitude of a vector x G R m is defined to be 


x 



= \/{x,x) = \lx\ + ... + xi,. 


m 


See Exercise 16 which legalizes taking roots. Expressed in coordinate-free language, 
the basic fact about the dot product is the 


9 Cauchy- Schwarz Inequality For all x,y G R m we have (x,y) < |x||y|. 


Proof Tricky! For any vectors x, y consider the new vector w = x + ty, where t G R 
is a varying scalar. Then 


Q(t) = (w,w) = ( x + ty , x + ty ) 

is a real-valued function of t. In fact, Q{t) > 0 since the dot product of any vector 
with itself is nonnegative. The bilinearity properties of the dot product imply that 

Q(t) = (x,x) + 2 t(x,y) + t 2 (y,y) = c + bt + at 2 

is a quadratic function of t. Nonnegative quadratic functions of t G R have nonpositive 
discriminants, b 2 — 4ac < 0. For if b 2 — 4ac > 0 then Q(t) has two real roots, between 
which Q(t) is negative. See Figure 6. 


b 2 — Aac < 0 


Q non-negative, 
one double root 



b 2 — 4 ac = 0 


Q both positive 
and negative, 
two real roots 



b 2 — Aac > 0 


Figure 6 Quadratic graphs 
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But b 2 — 4ac < 0 means that 4 (x,y) 2 — 4(x, x)(y, y) < 0, i.e. , 

(x,y) 2 < (x,x)(y,y). 


Taking the square root of both sides gives (x, y) < \J (x, x)\J (y, y) 
Exercise 17 here and below without further mention.) 


x\\y\. (We use 

□ 


The Cauchy- Schwarz inequality implies easily the Triangle Inequality for vec- 
tors: For all x, y G R m we have 


x + y | < \x\ T | y 


For \x + y I = (x + y, x + y) = (x,x) + 2(x,y) + (y, y). By Cauchy- Schwarz, 
2(x,y) < 2|x||y|. Thus, 


^ + y| 2 < |^| 2 + 2|x| |y| + \y \ 2 = (|x| + |y|)‘ 


Taking the square root of both sides gives the result. 

The Euclidean distance between vectors x,i/G R m is defined as the length of 
their difference, 

x-y | = Vix-y, x-y) = \J (xi - yi) 2 + . . . + (x m - y m ) 2 . 


From the Triangle Inequality for vectors follows the Triangle Inequality for dis- 
tance. For all x,y,z G R m we have 


x — z 


< \x — y\ + \y — z 


To prove it, think of x — z as the vector sum (x — y) + (y — z) and apply the Triangle 
Inequality for vectors. See Figure 7. 

Geometric intuition in Euclidean space can carry you a long way in real analysis, 
especially in being able to forecast whether a given statement is true or not. Your 
geometric intuition will grow with experience and contemplation. We begin with 
some vocabulary. 

In real analysis, vectors in R m are referred to as points in R m . The j th coordinate 
of the point (xi, . . . ,x m ) is the number Xj appearing in the j th position. The j th 
coordinate axis is the set of points x G R m whose k th coordinates are zero for all 
k 7 ^ j. The origin of R m is the zero vector, (0, . . . , 0). The first orthant of R m is 
the set of points x G R m all of whose coordinates are nonnegative. When m — 2, 
the first orthant is the first quadrant. The integer lattice is the set Z m C R m of 
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y 


Figure 7 How the Triangle Inequality gets its name 
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Figure 8 The integer lattice and first quadrant 


26 


Real Numbers 


Chapter 1 


ordered m-tuples of integers. The integer lattice is also called the integer grid. See 
Figure 8. 

A box is a Cartesian product of intervals 


[ai M\ 


x 


x 


a 


mi 



in R m . (A box is also called a rectangular parallelepiped.) The unit cube in 

R m is the box [0, l] m = [0, 1] x • • • x [0, 1]. See Figure 9. 



Figure 9 A box and a cube 


The unit ball and unit sphere in R m are the sets 


B 


m 


gm— 1 


{x G R 
{x G R 


x 

x 


<i} 

= !}• 


The reason for the exponent m — 1 is that the sphere is (m — l)-dimensional as 
an object in its own right although it does live in m-space. In 3-space, the surface of 
a ball is a two-dimensional him, the 2-sphere S 2 . See Figure 10. 

A set E C R m is convex if for each pair of points x, y G E, the straight line 
segment between x and y is also contained in E. The unit ball is an example of a 
convex set. To see this, take any two points in B m and draw the segment between 
them. It “obviously” lies in B m . See Figure 11. 

To give a mathematical proof, it is useful to describe the line segment between 
x and y with a formula. The straight line determined by distinct points x, y G R m 
is the set of all linear combinations sx + ty where s + 1 = 1, and the line segment is 
the set of these linear combinations where s and t are < 1. Such linear combinations 
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Figure 10 A 2-disc B 2 with its boundary circle, and a 2-sphere S 2 with its 

equator 



Figure 11 Convexity of the ball 


sx + ty with s + t — 1 and 0 < s, t < 1 are called convex combinations. The line 
segment is denoted as [x,y\. (This notation is consistent with the interval notation 
[a, b\. See Exercise 27.) Now if x, y E B m and sx + ty — z is a convex combination of 
x and y then, using the Cauchy-Schwarz Inequality and the fact that 2 st > 0, we get 


z,z) = s 2 {x,x) + 2st(x,y) + t 2 (y,y) 


< 

< 


x 


+ 2st|x||y| + t 2 \y\ 


T ‘Zst t 2 — (s 1) 2 — 1. 




Taking the square root of both sides gives 
ball. 


< 1, which proves convexity of the 
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Inner product spaces 


An inner product on a vector space V is an operation ( , ) on pairs of vectors 
in V that satisfies the same conditions that the dot product in Euclidean space does: 
Namely, bilinearity, symmetry, and positive definiteness. A vector space equipped 
with an inner product is an inner product space. The discriminant proof of the 
Cauchy-Schwarz Inequality is valid for any inner product defined on any real vector 
space, even if the space is infinite-dimensional and the standard coordinate proof 
would make no sense. For the discriminant proof uses only the inner product prop- 
erties, and not the particular definition of the dot product in Euclidean space. 

R m has dimension m because it has a basis ei, . . . , e m . Other vector spaces are 
more general. For example, let C([a, 6],R) denote the set of all of continuous real- 
valued functions defined on the interval [a, b\. (See Section 6 or your old calculus 
book for the definition of continuity.) It is a vector space in a natural way, the 
sum of continuous functions being continuous and the scalar multiple of a continuous 
function being continuous. The vector space C([a, &], R), however, has no finite basis. 
It is infinite-dimensional. Even so, there is a natural inner product, 

(f,g) = [ f(x)g(x)dx. 

J a 

Cauchy-Schwarz applies to this inner product, just as to any inner product, and we 
infer a general integral inequality valid for any two continuous functions, 


J f ( x )d( x ) d x < \j~j f ( x ) 2 dx ^ J g(x) 2 dx. 


It would be challenging to prove such an inequality from scratch, would it not? See 
also the first paragraph of the next chapter. 

A norm on a vector space V is any function | | : V R with the three properties 

of vector length: Namely, if u, w G V and A G R then 


|u| > 0 and 
\Xv\ = | A | \v 
I v + w < v 


V 


— 0 if and only if v = 0. 


+ 


w 


An inner product ( , ) defines a norm as v = y (u,u), but not all norms come 
from inner products. The unit sphere {v G V : (v,v) — 1} for every inner product is 
smooth (has no corners) while for the norm 


v 


max 


= max{|ui|, |u 2 |} 
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defined on v — (rq, V 2 ) G R 2 , the unit sphere is the perimeter of the square {(iq, ^2) G 
R 2 : \vi\ < 1 and \v 2 \ < 1}. It has corners and so it does not arise from an inner 
product. See Exercises 46, 47, and the Manhattan metric on page 76. 

The simplest Euclidean space beyond R is the plane R 2 . Its xy-coordinates can 
be used to define a multiplication, 

(x,y) • (x',y') = (xx' - yy\ xy' + x'y). 


The point (1, 0) corresponds to the multiplicative unit element 1, while the point (0, 1) 
corresponds to i — \f—l, which converts the plane to the field C of complex numbers. 
Complex analysis is the study of functions of a complex variable, i.e., functions f(z ) 
where z and f{z ) lie in C. Complex analysis is the good twin and real analysis the 
evil one: beautiful formulas and elegant theorems seem to blossom spontaneously in 
the complex domain, while toil and pathology rule the reals. Nevertheless, complex 
analysis relies more on real analysis than the other way around. 


4 Cardinality 

Let A and B be sets. A function / : A — >> B is a rule or mechanism which, when 
presented with any element a G A, produces an element b — f(a ) of B. It need not 
be defined by a formula. Think of a function as a device into which you feed elements 
of A and out of which pour elements of B. See Figure 12. We also call / a mapping 



Figure 12 The function / as a machine 

or a map or a transformation. The set A is the domain of the function and B is 
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its target, also called its codomain. The range or image of / is the subset of the 
target 


{b G B : there exists at least one element a G A with /(a) = b}. 
See Figure 13. 



Figure 13 The domain, target, and range of a function 

Try to write / instead of f(x) to denote a function. The function is the device 
which when confronted with input x produces output f(x). The function is the 
device, not the output. 

Think also of a function dynamically. At time zero all the elements of A are 
sitting peacefully in A. Then the function applies itself to them and throws them 
into B. At time one all the elements that were formerly in A are now transferred into 
B. Each a G A gets sent to some element f(a) G B. 

A mapping / : A —)> B is an injection (or is one-to-one) if for each pair of 
distinct elements a, a' G A, the elements /(a), f(a') are distinct in B. That is, 

a^a! => f(a) ± /(a 7 ). 

The mapping / is a surjection (or is onto) if for each b G B there is at least one 
a G A such that /(a) = b. That is, the range of / is B. 
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A mapping is a bijection if it is both injective and surjective. It is one-to-one 
and onto. If / : A — > B is a bijection then the inverse map / -1 : B -A A is a bijection 
where / _1 (6) is by definition the unique element a E A such that f(a) = b. 


The identity map of any set to itself is the bijection that takes each a E A and 
sends it to itself, id(a) = a. 

If / : A -A B and g : B -A C then the composite g o / : A — > C is the function 
that sends a E A to g(f(a)) E C. If / and g are injective then so is g o /, while if / 
and g are surjective then so is g o /, 


5 



In particular the composite of bijections is a bijection. If there is a bijection 
A onto B then A and B are said to have equal cardinality^ and we write A 
The relation ^ is an equivalence relation. That is, 


from 
~ B. 


(a) A ~ A. 

(b) A ~ B implies B ~ A. 

(c) A ^ B ^ C implies A ~ C. 

(a) follows from the fact that the identity map bijects A to itself, (b) follows from 
the fact that the inverse of a bijection A B is a bijection B A. (c) follows from 
the fact that the composite of bijections / and g is a bijection g o /. 

A set S is 


finite if it is empty or for some n E N we have S {1 , . . . , n}. 

infinite if it is not finite. 

denumerable if S ~ N. 

countable if it is finite or denumerable. 

uncountable if it is not countable. 

^The word “cardinal” indicates the number of elements in the set. The cardinal numbers are 
0, 1, 2, . . . The first infinite cardinal number is aleph null, Ho- One says the N has Ho elements. A 
mystery of math is the Continuum Hypothesis which states that R has cardinality Hi, the second 
infinite cardinal. Equivalently, if N C S C R, the Continuum Hypothesis asserts that S ^ N or 
S ~ R. No intermediate cardinalities exist. You can pursue this issue in Paul Cohen’s book, Set 
Theory and the Continuum Hypothesis. 
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We also write card A = cardi? and # A = ^ B when A, B have equal cardinality. 

If S is denumerable then there is a bijection / : N — > S', and this gives a way to 
list the elements of S as s\ — /( 1 ), S 2 — /( 2 ), S 3 = /( 3), etc. Conversely, if a set 
S is presented as an infinite list (without repetition) S — {si,S2,S3, • • •}, then it is 
denumerable: Define f{k) — s & for all k G N. In brief, denumerable = listable. 

Let’s begin with a truly remarkable cardinality result, that although N and R are 
both infinite, R is more infinite than N. Namely, 

10 Theorem R is uncountable. 

Proof There are other proofs of the uncountability of R, but none so beautiful as 
this one. It is due to Cantor. I assume that you accept the fact that each real number 
x has a decimal expansion, x — N.x 1X2X3 . . . , and it is uniquely determined by x if 
one agrees never to terminate the expansion with an infinite string of 9s. (See also 
Exercise 18.) We want to prove that R is uncountable. Suppose it is not uncountable. 
Then it is countable and, being infinite, it must be denumerable. Accordingly let 
/ : N — > R be a bijection. Using /, we list the elements of R along with their decimal 
expansions as an array, and consider the digits xa that occur along the diagonal in 
this array. See Figure 14. 


/( 1) 

= V 

Xu 

Xl2 

X 13 

X14 

X 15 

Xl 6 

X 17 

/( 2) 

= n 2 

X21 

X22 

X23 

X24 

%25 

%26 

X27 

/( 3) 

= N 3 

X31 

%32 

X33 

X34 

X35 

%36 

X37 

/( 4) 

= n 4 

X41 

X42 

X4.3 

X44 

X45 

X46 

X47 

/(5) 

= N 5 

X51 

^52 

%53 

X54 

X55 


X$7 

/( 6) 

= N 6 

Xqi 

XQ2 

%63 

X64 

^65 

^66 

XQ7 

m 

= n 7 

X71 

X72 

X73 

X74 

%75 

X76 

X77 


Figure 14 Cantor’s diagonal method 

For each z, choose a digit yi such that yi 7 ^ xu and yi 7 ^ 9. Where is the number 
V — O. 2 / 1 X/ 2 2/3 . . .? Is it / (1)? No, because the first digit in the decimal expansion of 


Section 4 


Cardinality 


33 


/( 1 ) is x\\ and y\ x\\. Is it /( 2)? No, because the second digit in the decimal 
expansion of /( 2) is X22 and 1/2 7^ X22 • Is it /(/c)? No, because the fc th digit in the 
decimal expansion of f(k) is Xkk and yk 7^ Xkk • Nowhere in the list do we find y. 
Nowhere! Therefore the list could not account for every real number, and R must 
have been uncountable. □ 

11 Corollary [a, b] and (a, b) are uncountable. 

Proof There are bijections from (a, b) onto (—1, 1) onto the unit semicircle onto R 
shown in Figure 15. The composite / bijects (a, b) onto R, so (a, b) is uncountable. 



Since [a, b] contains (a, 6), it too is uncountable. 


The remaining results in this section are of a more positive flavor. 


□ 


12 Theorem Each infinite set S contains a denumerable subset. 
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Proof Since S is infinite it is nonempty and contains an element s \. Since S is 
infinite the set {s 1} = {s G S : s 7^ s\} is nonempty and there exists S 2 G ST {s 1}. 
Since S' is an infinite set, S \ {si, S 2 } — {s E S : s ^ s 1, §2} is nonempty and there 
exists S3 G S \ {si, S2}. Continuing this way gives a list (s n ) of distinct elements of 
S. The set of these elements forms a denumerable subset of S. □ 

13 Theorem An infinite subset A of a denumerable set B is denumerable. 

Proof There exists a bijection / : N B. Each element of A appears exactly once 
in the list /( 1), /( 2), /( 3), ... of B. Define g(k) to be the k th element of A appearing 
in the list. Since A is infinite, g(k) is defined for all k G N. Thus g : N — > A is a 
bijection and A is denumerable. □ 

14 Corollary The sets of even integers and of prime integers are denumerable. 

Proof They are infinite subsets of N which is denumerable. □ 

15 Theorem N x N is denumerable. 


Proof Think of N x N as an 00 x 00 matrix and walk along the successive counter- 
diagonals. See Figure 16. This gives a list 

(1, 1), (2, 1), (1, 2), (3, 1), (2, 2), (1, 3), (4, 1), (3, 2), (2, 3), (1, 4), (5, 1), . . . 


of N x N and proves that N x N is denumerable. 



□ 


Figure 16 Counter-diagonals in an oo x oo matrix 
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16 Corollary The Cartesian product of denumerable sets A and B is denumerable. 

Proof N^NxN ^ A x B. □ 

17 Theorem If f : N — > B is a surjection and B is infinite then B is denumerable. 

Proof For each b G £>, the set {k G N : f{k) — b} is nonempty and hence contains a 
smallest element; say h(b) — k is the smallest integer that is sent to b by f. Clearly, if 
b,b' G B and b ^ b' then h{b) ^ h(b f ). That is, h : B — > N is an injection which bijects 
B to hB C N. Since B is infinite, so is hB. By Theorem 13, hB is denumerable and 
therefore so is B. □ 

18 Corollary The denumerable union of denumerable sets is denumerable. 

Proof Suppose that Ai, A 2 , . . . is a sequence of denumerable sets. List the elements 
of Ai as ai 1 , < 2 ^ 2 , . . . and define 

/ : N x N A = U Ai 
(h j) l— ^ a ij 

Clearly / is a surjection. According to Theorem 15, there is a bijection g : N — > 
N x N. The composite / o g is a surjection N — > A. Since A is infinite, Theorem 17 
implies it is denumerable. □ 

19 Corollary Q is denumerable. 

Proof Q is the denumerable union of the denumerable sets A q — {p/q : p G Z} as 
q ranges over N. □ 

20 Corollary For each m G N the set Q m is denumerable. 

Proof Apply the induction principle. If m — 1 then the previous corollary states 
that Q 1 is denumerable. Knowing inductively that Q m_1 is denumerable and Q m = 
Qm— 1 x result follows from Corollary 16. □ 

Combination laws for countable sets are similar to those for denumerable sets. As 
is easily checked, 

Every subset of a countable set is countable. 

A countable set that contains a denumerable subset is denumerable. 

The Cartesian product of finitely many countable sets is countable. 

The countable union of countable sets is countable. 
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5* Comparing Cardinalities 

The following result gives a way to conclude that two sets have the same cardinality. 
Roughly speaking the condition is that card A < card B and card I? < card A. 

21 Schroeder-Bernstein Theorem If A , B are sets and f : A — >> B, g : B — > A 

are injections then there exists a bijection h : A —)> B . 

Proof-sketch Consider the dynamic Venn diagram, Figure 17. The disc labeled gf A 



Figure 17 Pictorial proof of the Schroeder-Bernstein Theorem 

is the image of A under the map g o /. It is a subset of A. The ring between A and 
gf A divides into two subrings. Aq is the set of points in A that do not lie in the image 
of g, while A\ is the set points in the image of g that do not he in gf A. Similarly, 
Bq is the set of points in B that do not he in /A, while B\ is the set of points in 
f A that do not he in fgB. There is a natural bijection h from the pair of rings 
Aq U Ai — A \ gf A to the pair of rings Bq U B\ = B \ fgB. It equals / on the outer 
ring Aq — A \ gB and it is g~ x on the inner ring A\ — gB \ gf A. (The map g~ x is 
not defined on all of A, but it is defined on the set gB.) In this notation, h sends Aq 
onto B\ and sends A\ onto Bq. It switches the indices. Repeat this on the next pair 
of rings for A and B. That is, look at gf A instead of A and fgB instead of B. The 
next two rings in A, B are 

A 2 = gfA \ gfgB A 3 = gfgB \ gfgfA 

B 2 = fgB \ f gfA B 3 = fgfA \ fgfgB. 

Send A 2 to B 3 by / and A 3 to B 2 by 1 . The rings A, are disjoint, and so are 
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the rings Bi, so repetition gives a bijection 


4 > : \JAi 


u B 


n 


(U indicates disjoint union) defined by 


4>{x) = 


f(x) if x e Ai and i is even 


g 1 (x) if x G Ai and i is odd. 


Let A* = A \ (U Ai) and B* — B \ (\J Bi) be the rest of A and B. Then / bijects 
A* to B * and (j) extends to a bijection h : A — > B defined by 

if x G U Ai 

if x G A*. □ 



A supplementary aid in understanding the Schroeder Bernstein proof is the fol- 
lowing crossed ladder diagram, Figure 18. 



Figure 18 Diagramatic proof of the Schroeder-Bernstein Theorem 


Exercise 36 asks you to show directly that (a, b) ~ [a, b\. This makes sense since 
(a, b) C [a, b] C M and (a, b) ~ R should certainly imply (a, b) ~ [a, b] R. The 
Schroeder-Bernstein theorem gives a quick indirect solution to the exercise. The in- 
clusion map i : (a, b) ^ [a, b] sending x to x injects (a, b) into [a, 6], while the function 
j{x) — x/2 + (a + 6)/4 injects [a, b] into (a, 6). The existence of the two injections 
implies by the Schroeder-Bernstein Theorem that there is a bijection (a, b) ~ [a, b\. 
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6* The Skeleton of Calculus 


The behavior of a continuous function defined on an interval [a, b] is at the root of all 
calculus theory. Using solely the Least Upper Bound Property of the real numbers we 
rigorously derive the basic properties of such functions. The function / : [a, b] — > R 
is continuous if for each e > 0 and each x G [a, b] there is a 6 > 0 such that 


t G [a, b] and \t — x\ < 5 


f(t) - fix) I < e. 


See Figure 19. 



Figure 19 The graph of a continuous function of a real variable 


Continuous functions are found everywhere in analysis and topology. Theo- 
rems 22, 23, and 24 present their simplest properties. Later we generalize these 
results to functions that are neither real valued nor dependent on a real variable. 
Although it is possible to give a combined proof of Theorems 22 and 23 I prefer to 
highlight the Least Upper Bound Property and keep them separate. 


22 Theorem The values of a continuous function defined on an interval [a, b] form 
a bounded subset of R. That is , there exist m, M G R such that for all x G [a, b] we 
have m < f{x) < M . 


Proof For x G [a, 6], let V x be the value set of f(t) as t varies from a to x. 


V x — {y G R : for some t G [a, x\ we have y — f(t)}. 


Set 


A = {x G [a, b] : V x is a bounded subset of R}. 
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We must prove that b G X. Clearly a G X and b is an upper bound for X. Since 
X is nonempty and bounded above, there exists in i a least upper bound c < b for 
X. Take e = 1 in the definition of continuity at c. There exists a 5 > 0 such that 


x — c 


< 5 implies | f(x) — /(c) \ < 1. Since c is the least upper bound for X, there 
exists x G X in the interval [c — 5, c\. (Otherwise c — S is a smaller upper bound for 
X.) Now as t varies from a to c, the value f(t) varies first in the bounded set V x and 
then in the bounded set J — (/(c) — 1, /(c) + 1). See Figure 20. 



Figure 20 The value set V x and the interval J 


The union of two bounded sets is a bounded set and it follows that V c is bounded, 
so c G X. Besides, if c < b then f(t) continues to vary in the bounded set J for t > c, 
contrary to the fact that c is an upper bound for X. Thus, c = 6, b G X, and the 
values of / form a bounded subset of R. □ 


23 Theorem A continuous function f defined on an interval [a, b] takes on absolute 
minimum and absolute maximum values: For some xo,x\ G [a, b] and for all x G [a, b] 
we have 

/Go) < f(x) < f{x l). 

Proof Let M — l.u.b. f(t) as t varies in [a, b\. By Theorem 22 M exists. Consider 
the set X = {x G [a, b] : l.n.b.V x < M} where, as above, V x is the set of values of f(t) 
as t varies on [a, x . 
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Case 1. /(a) = M . Then / takes on a maximum at a and the theorem is proved. 

Case 2. /(a) < M. Then X 0 and we can consider the least upper bound of X, say 
c. If /(c) < M, we choose e > 0 with e < M — /(c). By continuity at c, there exists 
a 5 > 0 such that \t — c\ <5 implies | f(t) — /(c) | < e. Thus, l.\i.b.V c < M. If c < b 
this implies there exist points t to the right of c at which l.u.b .Vt < M, contrary to 
the fact that c is an upper bound of such points. Therefore, c = 6, which implies that 
M < M , a contradiction. Having arrived at a contradiction from the supposition 
that /(c) -AT, we duly conclude that /(c) — A4, so / assumes a maximum at c. The 
situation with minima is similar. □ 


24 Intermediate Value Theorem A continuous function defined on an interval 
a, 6] takes on (or “achieves,” “ assumes ,” or “attains”) all intermediate values: That 
is, if f(a ) = a, f(b ) = (3, and 7 given, a < 7 < fi, then there is some c G [a, 6] 
swc/i that /(c) = 7- TTie same conclusion holds if (3 < 7 < a. 


The theorem is pictorially obvious. A continuous function has a graph that is a 
curve without break points. Such a graph can not jump from one height to another. 
It must pass through all intermediate heights. 

Proof Set X — {x G [a, b] : l.u.b. V x < 7} and c = l.u.b.X. Now c exists because X 
is nonempty (it contains a) and it is bounded above (by b ). We claim that /(c) = 7, 
as shown in Figure 21. 

To prove it we just eliminate the other two possibilities which are /(c) < 7 and 
/(c) > 7, by showing that each leads to a contradiction. Suppose that /(c) < 7 
and take e = 7 — /(c). Continuity at c gives S > 0 such that \t — c\ < 5 implies 
| f(t) — /(c) | < e. That is, 


t G (c — (5, c H - $) 


/(*) < 


so c-\- 5 / 2 G X, contrary to c being an upper bound of X. 

Suppose that /(c) >7 and take e = /(c) — 7- Continuity at c gives 5 > 0 such 
that \t — c\ < S implies \f(t) — /(c) | < e. That is, 


t G (c — (5, c H - $) 


/(<) > 7 : 


so c — (5/2 is an upper bound for X, contrary to c being the least upper bound for 
X. Since /(c) is neither < 7 nor > 7 we get /(c) =7- □ 


A combination of Theorems 22, 23, 24, and Exercise 43 could well be called the 
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Figure 21 x G X implies that f(x) < 7- 


25 Fundamental Theorem of Continuous Functions Every continuous real val- 
ued function of a real variable x G [a, b } is bounded, achieves minimum, intermediate, 
and maximum values, and is uniformly continuous. 


7* Visualizing the Fourth Dimension 

A lot of real analysis takes place in R m but the full m-dimensionality is rarely im- 
portant. Rather, most analysis facts which are true when m — 1, 2, 3 remain true for 
m > 4. Still, I suspect you would be happier if you could visualize R 4 , R 5 , etc. Here 
is how to do it. 

It is often said that time is the fourth dimension and that R 4 should be thought 
of as xyzt- space where a point has position ( x,y,z ) in 3-space at time t. This is 
only one possible way to think of a fourth dimension. Instead, you can think of color 
as a fourth dimension. Imagine our usual 3-space with its xyz-coordinates in which 
points are colorless. Then imagine that you can give color to points (“paint” them) 
with shades of red indicating positive fourth coordinate and blue indicating negative 
fourth coordinate. This gives xyzc- coordinates. Points with equal xyz-coordinates 
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but different colors are different points. 

How is this useful? We have not used time as a coordinate, reserving it to describe 
motion in 4-space. Figure 22 shows two circles - the unit circle C in the horizontal 
xy - plane and the circle V with radius 1 and center (1,0,0) in the vertical xz- plane. 
They are linked. No continuous motion can unlink them in 3-space without one 



Figure 22 C and V are linked circles. 

crossing the other. However, in Figure 23 you can watch them unlink in 4-space as 
follows. 

Just gradually give redness to C while dragging it leftward parallel to the x-axis, 
until it is to the left of V. (Leave V always fixed.) Then diminish the redness of 
C until it becomes colorless. It ends up to the left of V and no longer links it. In 
formulas we can let 

C(t ) = {(x, y, z, c) G R 4 : (x + 2 t) 2 + y 2 = 1, z = 0, and c(t) — t(t — 1)} 
for 0 < t < 1. See Figure 23. 

The moving circle C{t ) never touches the stationary circle V. In particular, at 
time t = 1/2 we have C(t) H V = 0 . For (—1, 0, 0, 1/4) ^ (—1, 0, 0, 0). 

Other parameters can be used for higher dimensions. For example we could use 
pressure, temperature, chemical concentration, monetary value, etc. In theoretical 
mechanics one uses six parameters for a moving particle - three coordinates of position 
and three more for momentum. 

Moral Choosing a new parameter as the fourth dimension (color instead of time) 
lets one visualize 4-space and observe motion there. 
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Figure 23 How to unlink linked circles using the fourth dimension 
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Exercises 

0. Prove that for all sets A,B,C the formula 

4n(BuC) = (4flB)u(4nC) 

is true. Here is the solution written out in gory detail. Imitate this style in 
writing out proofs in this course. See also the guidelines for writing a rigorous 
proof on page 5. Follow them! 

Hypothesis. A,B,C are sets. 

Conclusion . 4n(BuC) = (4nB)U(4n C). 

Proof . To prove two sets are equal we must show that every element of the first 
set is an element of the second set and vice versa. Referring to Figure 24, let x 
denote an element of the set A H (B U C). It belongs to A and it belongs to B 
or to C . Therefore x belongs to A H B or it belongs to A H C . Thus x belongs 
to the set (An B) U (An C ) and we have shown that every element of the first 
set A Fl (B U C) belongs to the second set (A n B) U (A n C). 

A A 




B C 


B C 


Figure 24 A is ruled vertically, B and C are ruled horizontally, A H B is 
ruled diagonally, and AnC is ruled counter-diagonally. 

On the other hand let y denote an element of the set (An B) n (An C). It 
belongs to AnB or it belongs to AnC . Therefore it belongs to A and it belongs 
to B or to C . Thus y belongs to A H (B U C) and we have shown that every 
element of the second set (4flB)U(4n C) belongs to the first set An(BnC). 
Since each element of the first set belongs to the second set and each element 
of the second belongs to the first, the two sets are equal, A H (B U C) — 

(An B)n (AnC). qed 

1. Prove that for all sets A, B , C the formula 


4u(BnC) = (4uB)n(4uC) 
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is true. 

2. If several sets A, B, C , . . . all are subsets of the same set X then the differences 
X \ A, X \ B, X \ C, . . . are the complements of A, B, C , . . . in X and are 
denoted A c , H c , C c , . . .. The symbol A c is read U A complement.” 

(a) Prove that ( A c ) c — A. 

(b) Prove De Morgan’s Law: (A(lB) c = 4 C UB C and derive from it the law 
(A U B) c = A c H B c . 

(c) Draw Venn diagrams to illustrate the two laws. 

(d) Generalize these laws to more than two sets. 

3. Recast the following English sentences in mathematics, using correct mathe- 
matical grammar. Preserve their meaning. 

(a) 2 is the smallest prime number. 

(b) The area of any bounded plane region is bisected by some line parallel to 
the x-axis. 

*(c) “All that glitters is not gold.” 

*4. What makes the following sentence ambiguous? “A death row prisoner can’t 
have too much hope.” 

5. Negate the following sentences in English using correct mathematical grammar. 

(a) If roses are red, violets are blue. 

*(b) He will sink unless he swims. 

6. Why is the square of an odd integer odd and the square of an even integer even? 
What is the situation for higher powers? [Hint: Prime factorization.] 

7. (a) Why does 4 divide every even integer square? 

(b) Why does 8 divide every even integer cube? 

(c) Why can 8 never divide twice an odd cube? 

(d) Prove that the cube root of 2 is irrational. 

8. Suppose that the natural number k is not a perfect n th power. 

a Prove that its n th root is irrational. 

b Infer that the root of a natural number is either a natural number or 
it is irrational. It is never a fraction. 

9. Let x — A B , x' — A' B' be cuts in G. We defined 


10. 

11 . 


x V x' — (A A A') | rest of 

(a) Show that although B + B r is disjoint from A + A r , it may happen in 
degenerate cases that Q is not the union of A + A' and B + B ' . 

(b) Infer that the definition of x-\-x f as (A + A') | ( B + B') would be incorrect. 

(c) Why did we not define 00 * 00 — (A - A') | rest of Q? 

Prove that for each cut x we have x + (— x) — 0*. [This is not entirely trivial.] 
A multiplicative inverse of a nonzero cut x — A\B is a cut y — C\D such that 
x • y = 1 . 
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(a) If x > 0* what are C and D1 

(b) If x < 0* what are they? 

(c) Prove that x uniquely determines y. 

12. Prove that there exists no smallest positive real number. Does there exist a 
smallest positive rational number? Given a real number x, does there exist a 
smallest real number y > x? 

13. Let b — l.u.b. S', where S is a bounded nonempty subset of R. 

(a) Given e > 0 show that there exists an s G S with 

b — e < s < b. 


(b) Can s G S always be found so that b — e < s <bl 

(c) If x = A\B is a cut in Q, show that x — l.u.b. A. 

14. Prove that \/2 G R by showing that x-x = 2 where x = A\B is the cut in Q with 
A = {r = Q : r < 0 or r 2 < 2}. [Hint: Use Exercise 13. See also Exercise 16, 
below.] 

15. Given y G R, n G N, and e > 0, show that for some 5 > 0, if u G R and 
u — y\ < 5 then \u n — y n \ < e. [Hint: Prove the inequality when n — 1, n — 2, 
and then do induction on n using the identity 


u n — y n — { U — y){u n 1 + u n ' 


■'y+... + y^).} 


16. Given x > 0 and n G N, prove that there is a unique y > 0 such that y n — x. 
That is, the n th root of x exists and is unique. [Hint: Consider 

y — 1. u. b.{s G R : s n < x}. 

Then use Exercise 15 to show that y n can be neither < x nor > x.] 

17. Let x, y G R and n G N be given. 

(a) Prove that x < y if and only if x n <y n . 

(b) Infer from Exercise 16 that x < y if and only if the root of x is less 
than the n th root of y. 

18. Prove that real numbers correspond bijectively to decimal expansions not ter- 
minating in an infinite strings of nines, as follows. The decimal expansion of 
x G R is N.x\X 2 . . ., where N is the largest integer < x, x\ is the largest integer 
< 10(x — N\ X 2 is the largest integer < 100(x — (N + xi/10)), and so on. 

(a) Show that each x & is a digit between 0 and 9. 

(b) Show that for each k there is an £ > k such that X£ ^ 9. 

(c) Conversely, show that for each such expansion N.x±X 2 . . . not terminating 
in an infinite string of nines, the set 


{N, N + 


X\ 

To’ 


N + 



X 2 

100 ’ 
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is bounded and its least upper bound is a real number x with decimal 
expansion N.X 1 X 2 .... 

(d) Repeat the exercise with a general base in place of 10. 

19. Formulate the definition of the greatest lower bound (g.l.b.) of a set of real 
numbers. State a g.l.b. property of R and show it is equivalent to the l.u.b. 
property of R. 

20. Prove that limits are unique, i.e., if (a n ) is a sequence of real numbers that 
converges to a real number b and also converges to a real number b' , then 
b = b'. 


21 


22. 


23. 


Let / : A -A B be a function. That is, / is some rule or device which, when 
presented with any element a G A, produces an element b = /(a) of B. The 
graph of / is the set S of all pairs (a, 6) G A x B such that b — /(a). 

(a) If you are given a subset ScixB, how can you tell if it is the graph of 
some function? (That is, what are the set theoretic properties of a graph?) 

(b) Let g : B -A C be a second function and consider the composed function 
g o / : A —x C. Assume that A — B — C — [ 0,1], draw A x B x C as the 
unit cube in 3-space, and try to relate the graphs of /, g, and g o f in the 
cube. 

A fixed-point of a function / : A -A A is a point a G A such that /(a) = a. 
The diagonal of A x A is the set of all pairs (a, a) in A x A. 

(a) Show that / : A -A A has a fixed-point if and only if the graph of / 
intersects the diagonal. 

(b) Prove that every continuous function / : [0, 1] — X [0, 1] has at least one 
fixed-point. 

(c) Is the same true for continuous functions / : (0, 1) — X (0, 1)?^ 

(d) Is the same true for discontinuous functions? 

A rational number p/q is dyadic if q is a power of 2, q = 2 k for some nonnegative 
integer k. For example, 0,3/8, 3/1, —3/256, are dyadic rationals, but 1/3,5/12 
are not. A dyadic interval is [a, b] where a — p/2 k and b — (jp + l)/2 k . For 
example, [.75, 1] is a dyadic interval but [1,7 r], [0,2], and [.25, .75] are not. A 
dyadic cube is the product of dyadic intervals having equal length. The set of 
dyadic rationals may be denoted as Q 2 and the dyadic lattice as Q™. 

(a) Prove that any two dyadic squares (i.e., planar dyadic cubes) of the same 
size are either identical, intersect along a common edge, intersect at a 
common vertex, or do not intersect at all. 

(b) Show that the corresponding intersection property is true for dyadic cubes 


in R 


m 


question posed in this manner means that, as well as answering the question with a “yes” or 
a “no,” you should give a proof if your answer is “yes” or a specific counterexample if your answer 
is “no.” Also, to do this exercise you should read Theorems 22, 23, 24. 


48 


Real Numbers 


Chapter 1 


24. Given a cube in R m , what is the largest ball it contains? Given a ball in 
R m , what is the largest cube it contains? What are the largest ball and cube 
contained in a given box in R m ? 

25. (a) Given e^O, show that the unit disc contains finitely many dyadic squares 

whose total area exceeds n — e, and which intersect each other only along 
their boundaries. 

**(b) Show that the assertion remains true if we demand that the dyadic squares 
are disjoint. 

(c) Formulate (a) in dimension m — 3 and m > 4. 

**(d) Do the analysis with squares and discs interchanged. That is, given e > 0 
prove that finitely many disjoint closed discs can be drawn inside the unit 
square so that the total area of the discs exceeds 1 — e. [Hint: The Pile 
of Sand Principle. On the first day of work, take away 1/16 of a pile of 
sand. On the second day take away 1/16 of the remaining pile of sand. 
Continue. What happens to the pile of sand after n days when n — > oo? 
Instead of sand, think of your obligation to place finitely many disjoint 
dyadic squares (or discs) that occupy at least 1/16 of the area of the unit 
disc (or unit square).] 

*26. Let b(R) and s(R) be the number of integer unit cubes in R m that intersect the 
ball and sphere of radius i?, centered at the origin. 

(a) Let m — 2 and calculate the limits 


lim 


s(R) 


and lim 


s (Ry 


R — ^oo m R — ^oo m ' 

(b) Take m > 3. What exponent k makes the limit 

lim 

R — ^OO OyR.j 

interesting? 

(c) Let c(R ) be the number of integer unit cubes that are contained in the 
ball of radius i?, centered at the origin. Calculate 

Urn 

R—t oo b[R) 

(d) Shift the ball to a new, arbitrary center (not on the integer lattice) and 
re-calculate the limits. 

27. Prove that the interval [a, b] in R is the same as the segment [a, b] in R 1 . That 
is, 

{x G R : a < x < b} 

— {y G R : 3 s, t G [0, 1 ] with s + t = 1 and y — sa + tb}. 

[Hint: How do you prove that two sets are equal?] 
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28. A convex combination of w\, . . . , wj~ G R m is a vector sum 

W = SiWi + • • • + SkWk 

such that si + ••• + Sk = 1 and 0 < si, . . . , < 1. 

(a) Prove that if a set E is convex then E contains the convex combination of 
any finite number of points in E. 

(b) Why is the converse obvious? 

29. (a) Prove that the ellipsoid 

3 x 2 y 2 z 2 

E — {(x, y, z) G R : ~2 + 72 + ~2 — 

a z o z c z 

is convex. [Hint: E is the unit ball for a different dot product. What is 
it? Does the Cauchy-Schwarz inequality not apply to all dot products?] 

(b) Prove that all boxes in R m are convex. 

30. A function / : (a, b) R is a convex function if for all x, y G (a, b ) and all 
s, t G [0, 1] with s + t = 1 we have 

f(sx + ty) < sf(x) + tf(y). 

(a) Prove that / is convex if and only if the set S of points above its graph is 
convex in R 2 . The set S is {(x,y) : f(x) < y}. 

*(b) Prove that every convex function is continuous. 

(c) Suppose that / is convex and a < x < u < b. The slope a of the line 
through (x, f(x)) and (iq f(u)) depends on x and u, say a — cr(x, u). Prove 
that a increases when x increases, and a increases when u increases. 

(d) Suppose that / is second-order differentiable. That is, / is differentiable 
and its derivative f is also differentiable. As is standard, we write (/ / ) / = 
f" . Prove that / is convex if and only if f"(x ) > 0 for all x e (a, b ). 

(e) Formulate a definition of convexity for a function / : M — > R where 
M C R m is a convex set. [Hint: Start with m — 2.] 

*31. Suppose that a function / : [a, b] — > R is monotone nondecreasing. That is, 
x\ < X 2 implies f(x i) < f(x 2 ). 

(a) Prove that / is continuous except at a countable set of points. [Hint: Show 
that at each x G (a, 6), / has right limit f(x+) and a left limit f(x — ), 
which are limits of f(x + h ) as h tends to 0 through positive and negative 
values respectively. The jump of / at x is f(x+) — f(x—). Show that / is 
continuous at x if and only if it has zero jump at x. At how many points 
can / have jump > 1? At how many points can the jump be between 1/2 
and 1? Between 1/3 and 1/2?] 

(b) Is the same assertion true for a monotone function defined on all of R? 

*32. Suppose that E is a convex region in the plane bounded by a curve C. 
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(a) Show that C has a tangent line except at a countable number of points. 
[For example, the circle has a tangent line at all its points. The triangle 
has a tangent line except at three points, and so on.] 

(b) Similarly, show that a convex function has a derivative except at a count- 
able set of points. 

*33. Let /(fc,m) be the number of fc-dimensional faces of the m-cube. See Table 1. 



m = 1 

m = 2 

m = 3 

m = 4 

rn = 5 

. . . 

m 

771 + 1 

k = 0 

2 

4 

8 

f( 0,4) 

/( 0,5) 

. . . 

f (0, m) 

/(0, m + 1) 

k = 1 

1 

4 

12 

/(l, 4) 

/( 1,5) 

. . . 

/(l,m) 

/(l, 771 + 1) 

k = 2 

0 

1 

6 

/( 2,4) 

/( 2,5) 

. . . 

/( 2,m) 

f (2, 771 + 1) 

k = 3 

0 

0 

1 

/(3,4) 

/( 3,5) 

. . . 

/( 3,m) 

/(3,77l + 1) 

k = 4 

0 

0 

0 

/(4,4) 

/(4,5) 

. . . 

/ (4, m) 

/(4,77l + 1) 

. . . 

. . . 

. . . 

. . . 

. . . 

. . . 

. . . 

. . . 

. . . 


Table 1 /(fc,m) is the number of fc-dimensional faces of the m-cube. 


(a) Verify the numbers in the first three columns. 

(b) Calculate the columns m — 4, nn — 5, and give the formula for passing 
from the m th column to the (m + l) st . 

(c) What would an rn = 0 column mean? 

(d) Prove that the alternating sum of the entries in any column is 1. That is, 
2 — 1 — 1, 4—4+1 — 1, 8 — 12+6 — 1 — 1, and in general ^( — l) k f(k, rn) — 
1. This alternating sum is called the Euler characteristic. 

34. Find an exact formula for a bijection / : N x N -+ N. Is one 


f(ij) = j + (1 + 2 + ••• + (i + j-2)) 


i 2 + j 2 + i(2j - 3) - j + 2 9 

2 


35. Prove that the union of denumerably many sets £>&, each of which is countable, 
is countable. How could it happen that the union is finite? 

*36. Without using the Schroeder-Bernstein Theorem, 

(a) Prove that [a, b] ~ (a, b] ~ (a, b). 

(b) More generally, prove that if C is countable then 


R \ C ~ R ~ RuC. 


(c) Infer that the set of irrational numbers has the same cardinality as R, i.e., 
R\Q ^ R. [Hint: Imagine that you are the owner of denumerably many 
hotels, H l, i?2, . . • , all fully occupied, and that a traveler arrives and asks 
you for accommodation. How could you re-arrange your current guests to 
make room for the traveler?] 
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*37. Prove that R 2 ~ R. [Hint: Think of shuffling two digit strings 

(aitt2tt3 • • .)&(&l62^3 • • •) — ^ (tti&itt2&2&3&3 • • •)• 

In this way you could transform a pair of reals into a single real. Be sure to 
face the nines-termination issue.] 

38. Let S' be a set and let 7 = 7(S) be the collection of all subsets of S. [7(S) is 
called the power set of S.] Let 3 be the set of functions / : S — > {0, 1}. 

(a) Prove that there is a natural bijection from 3 onto 7 defined by 

/ { s E S : f(s) = 1}. 

*(b) Prove that the cardinality of 7 is greater than the cardinality of S, even 
when S is empty or finite. 

[Hints: The notation Y x is sometimes used for the set of all functions X — > Y . 
In this notation 3 — {0,1}^ and assertion (b) becomes #(S) < #({0,1}^). 
The empty set has one subset, itself, whereas it has no elements, so #(0) = 0, 
while #({0, l} 1 ^) = 1, which proves (b) for the empty set. Assume there is a 
bijection from S onto 7. Then there is a bijection (3 : S — ^ jF, and for each 
s G S', f3(s) is a function, say f s :S — > {0, 1}. Think like Cantor and try to find 
a function which corresponds to no s. Infer that /3 could not have been onto.] 

39. A real number is algebraic if it is a root of a nonconstant polynomial with 
integer coefficients. 

(a) Prove that the set A of algebraic numbers is denumerable. [Hint: Each 
polynomial has how many roots? How many linear polynomials are there? 
How many quadratics? . . . 

(b) Repeat the exercise for roots of polynomials whose coefficients belong to 
some fixed, arbitrary denumerable set S C R. 

*(c) Repeat the exercise for roots of trigonometric polynomials with integer 
coefficients. 

(d) Real numbers that are not algebraic are said to be transcendental. Try- 
ing to fold transcendental numbers is said to be like looking for hay in a 
haystack. Why? 

40. A finite word is a finite string of letters, say from the roman alphabet. 

(a) What is the cardinality of the set of all finite words, and thus of the set of 
all possible poems and mathematical proofs? 

(b) What if the alphabet had only two letters? 

(c) What if it had countably many letters? 

(d) Prove that the cardinality of the set of all infinite words formed using 
a finite alphabet of n letters, n > 2, is equal to the cardinality of R. 
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(e) Give a solution to Exercise 37 by justifying the equivalence chain 

R 2 = R x R E2 x S2 ^ S4 x S4 R. 


41 


42. 


*43. 


(f) How many decimal expansions terminate in an infinite string of 9’s? How 
many don’t? 

If v is a value of a continuous function / : [a, b] -G R use the Least Upper 
Bound Property to prove that there are smallest and largest x G [a, b] such that 
f(x) = V. 

A function defined on an interval [a, b\ or (a, b) is uniformly continuous 
if for each e > 0 there exists a 5 > 0 such that \x — t\ < S implies that 
| f{pc) — f(t) | < e. (Note that this S cannot depend on x, it can only depend 
on e. With ordinary continuity, the S can depend on both x and e.) 

(a) Show that a uniformly continuous function is continuous but continuity 
does not imply uniform continuity. (For example, prove that sin(l/x) is 
continuous on the interval (0, 1) but is not uniformly continuous there. 
Graph it.) 

(b) Is the function 2x uniformly continuous on the unbounded interval (— 00 , 00 )? 

(c) What about x 2 ? 

Prove that a continuous function defined on an interval [a, b] is uniformly con- 
tinuous. [Hint: Let e > 0 be given. Think of e as fixed and consider the 
sets 


A(S) — {u G [a, b] 


A= U A(S). 

s> 0 


if x, t G [a, u\ and \x — t\ < 5 
then | f(x) - f(t) | < e} 


*44. 


*45. 


Using the Least Upper Bound Property, prove that b G A. Infer that / is uni- 
formly continuous. The fact that continuity on [a, b] implies uniform continuity 
is one of the important, fundamental principles of continuous functions.] 

Define injections / : N — > N and g : N — > N by f(n) = 2 n and g(n) — 2 n. From 
/ and g , the Schroeder-Bernstein Theorem produces a bijection N N. What 
is it? 

Let (a n ) be a sequence of real numbers. It is bounded if the set A = {ai, < 22 , . . .} 
is bounded. The limit supremum, or limsup, of a bounded sequence (a n ) as 
n 00 is 

lim sup a n = lim ( sup a k ) 

fl ^OO ^ \ J 


k>n 


(a) Why does the limsup exist? 

(b) If sup{a n } = 00 , how should we define lim sup a n l 

n— ?>oo 

(c) If lim a n — — 00 , how should we define lim sup a n l 

n— 7>oo 
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(d) When is it true that 


lim sup(a n + b n ) < lim sup a n + lim sup b n 

n— )>oo n— ^oo n— >oc 

lim sup ca n — c lim sup a n ? 

n— ^oo n— >oo 

When is it true they are unequal? Draw pictures that illustrate these 
relations. 

(e) Define the limit infimum, or lim inf, of a sequence of real numbers, and 
find a formula relating it to the limit supremum. 

(f) Prove that lim a n exists if and only if the sequence (a n ) is bounded and 

n—t oo 

lim inf a n — lim sup a n . 

n— ^oo n — ^oo 

**46. The unit ball with respect to a norm on R 2 is 


47. 


{v G M 2 : \\v\\ < 1}. 

(a) Find necessary and sufficient geometric conditions on a subset of R 2 that 
it is the unit ball for some norm. 

(b) Give necessary and sufficient geometric conditions that a subset be the 
unit ball for a norm arising from an inner product. 

(c) Generalize to R m . [You may find it useful to read about closed sets in the 
next chapter, and to consult Exercise 41 there. 

Assume that V is an inner product space whose inner product induces a norm 
as \x\ = J (x„ x 


(a) Show that | | obeys the parallelogram law 


x + y\ 2 + \x - y | 2 = 2 


x 


+ 2| y 


for all x, y G V. 

*(b) Show that any norm obeying the parallelogram law arises from a unique 
inner product. [Hints: Define the prospective inner product as 



x + y 

2 

x-y 

2 


2 


Checking that ( , ) satisfies the inner product properties of symmetry and 
positive definiteness is easy. Also, it is immediate that |x| 2 = (x,x), so 
( , ) induces the given norm. Checking bilinearity is another story. 

(i) Let x, y, z G V be arbitrary. Show that the parallelogram law implies 


(x + y, z) + (x- y, z) = 2(x, y), 
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and infer that (2x, z) — 2(x, z ). For arbitrary u, v G V set x = ^(u+v) 
and y — ^(u — u), plug in to the previous equation, and deduce 

(u,z) + (u,z) = (u + v, z), 

which is additive bilinearity in the first variable. Why does it now fol- 
low at once that ( , ) is also additively bilinear in the second variable? 
(ii) To check multiplicative bilinearity, prove by induction that if m G Z 
then mn(x,y) — ( mx,y ), and if n G N then ^(x,y) — (^x,y). Infer 
that r(x,y) — ( rx,y ) when r is rational. Is A i— > (\x,y) — \(x,y) 
a continuous function of A G R, and does this give multiplicative 
bilinearity?] 

48. Consider a knot in 3-space as shown in Figure 25. In 3-space it cannot be 



Figure 25 An overhand knot in 3-space 


unknotted. How can you unknot it in 4-space? 

*49. Prove that there exists no continuous three dimensional motion de-linking the 
two circles shown in Figure 22 which keeps both circles flat at all times. 

50. The Klein bottle is a surface that has an oval of self intersection when it is 
shown in 3-space. See Figure 26. It can live in 4-space with no self-intersection. 



oval of self-intersection 


Figure 26 The Klein Bottle in 3-space has an oval of self-intersection. 


How? 
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51. Read Flatland by Edwin Abbott. Try to imagine a Flat lander using color to 
visualize 3-space. 

52. Can you visualize a 4-dimensional cube - its vertices, edges, and faces? [Hint: It 
may be easier (and equivalent) to picture a 4-dimensional parallelepiped whose 
eight red vertices have xyz - coordinates that differ from the xyz-coordinates 
of its eight colorless vertices. It is a 4-dimensional version of a rectangle or 
parallelogram whose edges are not parallel to the coordinate axes.] 


2 

A Taste of Topology 


1 Metric Spaces 

It may seem paradoxical at first, but a specific math problem can be harder to solve 
than some abstract generalization of it. For instance if you want to know how many 
roots the equation 

t 5 - 4i 4 + t 3 - t + 1 = 0 

can have then you could use calculus and figure it out. It would take a while. But 
thinking more abstractly, and with less work, you could show that every n th -degree 
polynomial has at most n roots. In the same way many general results about functions 
of a real variable are more easily grasped at an abstract level - the level of metric 
spaces. 

Metric space theory can be seen as a special case of general topology, and many 
books present it that way, explaining compactness primarily in terms of open cov- 
erings. In my opinion, however, the sequence/subsequence approach provides the 
easiest and simplest route to mastering the subject. Accordingly it gets top billing 
throughout this chapter. 

A metric space is a set M, the elements of which are referred to as points of M, 
together with a metric d having the three properties that distance has in Euclidean 
space. The metric d — d(x, y ) is a real number defined for all points x, y G M and 
d(x, y) is called the distance from the point x to the point y. The three distance 
properties are as follows: For all x, y, z G M we have 
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(a) positive definiteness: d(x, y ) >0, and d{x , y) = 0 if and only if x = y. 

(b) symmetry: d(x,y) — d(y, x). 

(c) triangle inequality: d(x,z) < d(x,y) + d(y,z). 

The function d is also called the distance function. Strictly speaking, it is 
the pair (M, d) which is a metric space, but we will follow the common practice of 
referring to “the metric space M,” and leave to you the job of inferring the correct 
metric. 


The main examples of metric spaces are R, R m , and their subsets. The metric 
on R is d(x, y) = \x — y\ where x, y G R and \x — y\ is the magnitude of x — y. The 
metric on R m is the Euclidean length of x — y where x, y are vectors in R m . Namely, 


d(x,y) = yj (xi - yi) 2 + . . . + (x m - y m ) 2 


for x = (xi, x m ) and y = (y i, . . . , y m ). 

Since Euclidean length satisfies the three distance properties, d is a bona fide 
metric and it makes R m into a metric space. A subset M C R m becomes a metric 
space when we declare the distance between points of M to be their Euclidean distance 
apart as points in R m . We say that M inherits its metric from R m and is a metric 
subspace of R m . Figure 27 shows a few subsets of R 2 to suggest some interesting 
metric spaces. 

There is also one metric that is hard to picture but valuable as a source for 
counterexamples, the discrete metric. Given any set M, define the distance between 
distinct points of M to be 1 and the distance between every point and itself to be 
0. This is a metric. See Exercise 4. If M consists of three points, say M — {a, 5, c}, 
you can think of the vertices of the unit equilateral triangle as a model for M . See 
Figure 28. They have mutual distance 1 from each other. If M consists of one, two, or 
four points can you think of a model for the discrete metric on Ml More challenging 
is to imagine the discrete metric on R. All points, by definition of the discrete metric, 
he at unit distance from each other. 


Convergent Sequences and Subsequences 

A sequence of points in a metric space M is a list pi,P 2 , • • • where the points 
p n belong to M. Repetition is allowed, and not all the points of M need to appear 
in the list. Good notation for a sequence is (p n ), or (pn)neN- The notation {p n } 
is also used but it is too easily confused with the set of points making up the se- 
quence. The difference between (p n )neN an d {p n : n G N} is that in the former case 
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Figure 27 Five metric spaces - a closed outward spiral, a Hawaiian earring, 
a topologist’s sine circle, an infinite television antenna, and Zeno’s maze 



Figure 28 The vertices of the unit equilateral triangle form a discrete 

metric space. 
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the sequence prescribes an ordering of the points, while in the latter the points get 
jumbled together. For example, the sequences 1, 2, 3, . . . and 1, 2, 1, 3, 2, 1, 4, 3, 2, 1, . . . 
are different sequences but give the same set of points, namely N. 

Formally, a sequence in M is a function / : N -G M . The term in the sequence 
is f(n) — p n . Clearly, every sequence defines a function / : N -G M and conversely, 
every function / : N -G M defines a sequence in M. The sequence (p n ) converges 
to the limit p in M if 


Ve > 0 3N G N such that 
n G N and n > N => d(p n ,p ) < e. 

Limits are unique in the sense that if (p n ) converges to p and if (p n ) also converges 
to p' then p — p' . The reason is this. Given any e > 0 , there are integers N and N ' 
such that if n > N then d(p n ,p ) < G while if n > N f then d(p n ,p') < e. Then for all 
n > max {TV, N'} we have 

d{PiP) < d(p,p n ) + d(p n ,p) < e + e = 2e. 

But e is arbitrary and so d(p,p') = 0 and p — p' . (This is the e-principle again.) 

We write p n — > p, or p n — > p as n -> oo, or 

lim Pn — P 

n— >oo 

to indicate convergence. For example, in R the sequence p n — 1/n converges to 0 as 
n oo. In R 1 2 the sequence (1/n, sinn) does not converge as n oo. In the metric 
space Q (with the metric it inherits from R) the sequence 1, 1.4, 1.414, 1.4142, . . . 
does not converge. 

Just as a set can have a subset, a sequence can have a subsequence. For ex- 
ample, the sequence 2, 4, 6, 8, . . . is a subsequence of 1, 2, 3, 4, . . .. The sequence 
3, 5, 7, 11, 13, 17, ... is a subsequence of 1, 3, 5, 7, 9, . . ., which in turn is a subsequence 
of 1,2, 3, 4,.... In general, if (p n )neN an( i (qk)keN are sequences and if there is a 
sequence n\ < < . . . of positive integers such that for each k E N we have 

q k — Pn k then (q k ) is a subsequence of (p n ). Note that the terms in the subsequence 
occur in the same order as in the mother sequence. 

1 Theorem Every subsequence of a convergent sequence in M converges and it con- 

verges to the same limit as does the mother sequence. 
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Proof Let (q%) be a subsequence of (p n ), — p Uk , where rq < ri 2 < .... Assume 

that (p n ) converges to p in M. Given e > 0, there is an N such that for all n > N 
we have d(p n ,p) < e. Since ?q,n 2 , . . . are positive integers we have k < rq for all k. 
Thus, if k > N then > N and d(qj^^p) < e. Hence (c[k) converges to p. □ 

A common way to state Theorem 1 is that limits are unaffected when we pass to 
a subsequence. 


2 Continuity 

In linear algebra the objects of interest are linear transformations. In real analysis 
the objects of interest are functions, especially continuous functions. A function / 
from the metric space M to the metric space N is just that; / : M — > N and / sends 
points p G M to points fp G N. The function maps M to N. The way you should 
think of functions - as devices, not formulas - is discussed in Section 4 of Chapter 1. 
The most common type of function maps M to R. It is a real- valued function of the 
variable p G M. 

Definition A function / : M — > IV is continuous if it preserves sequential 
convergence: / sends convergent sequences in M to convergent sequences in AT, 
limits being sent to limits. That is, for each sequence (p n ) in M which converges to 
a limit p in M, the image sequence ( f{p n )) converges to fp in N. 

Here and in what follows, the notation fp is often used as convenient shorthand 
for f(p). The metrics on M and N are and djy. We will often refer to either 
metric as just d. 

2 Theorem The composite of continuous functions is continuous. 

Proof Let / : M — > N and g : N — > P be continuous and assume that 

lim Pn — P 

n— oo 

in M . Since / is continuous, lim f(p n ) — fp. Since g is continuous, lim g(f(p n )) — 

n— ^ oo n— ^oo 

g(fp) and therefore g o / : M — >> P is continuous. See Figure 29 on the next page. □ 

Moral The sequence condition is the easy way to tell at a glance whether a function 
is continuous. 
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Figure 29 The composite function g o f 

There are two “obviously” continuous functions. 

3 Proposition For every metric space M the identity map id : M — > M is continu- 
ous, and so is every constant function f : M — > N. 

Proof Let p n — >> p in M . Then id (p n ) — p n — > p — id(p) as n — > oo which gives 
continuity of the identity map. Likewise, if f(x) — q G N for all x G M and if p n — >> p 
in M then fp — q and f(p n ) — Q for all n. Thus f(p n ) — >* fp as n — >> oo which gives 
continuity of the constant function /. □ 


Homeomorphism 


Vector spaces are isomorphic if there is a linear bijection from one to the other. 
When are metric spaces isomorphic? They should “look the same.” The letters Y 
and T look the same; and they look different from the letter 0. If / : M — > N 
is a bijection and / is continuous and the inverse bijection / -1 : N — > M is also 
continuous then / is a homeomorphism^ (or a “homeo” for short) and M,N are 
homeomorphic. We write M = N to indicate that M and N are homeomorphic. 
= is an equivalence relation: M = M since the identity map is a homeomorphism 
M — > M; M = N clearly implies that N = M; and the previous theorem shows that 
the composite of homeomorphisms is a homeomorphism. 


Geometrically speaking, a homeomorphism is a bijection that can bend, twist, 
stretch, and wrinkle the space M to make it coincide with N , but it cannot rip, 

^This is a rare case in mathematics in which spelling is important. Homeomorphism 7^ homomor- 
phism. 
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puncture, shred, or pulverize M in the process. The basic questions to ask about 
metric spaces are: 

(a) Given M and iV, are they homeomorphic? 

(b) What are the continuous functions from M to N1 

A major goal of this chapter is to show you how to answer these questions in 
many cases. For example, is the circle homeomorphic to the interval? To the sphere? 
etc. Figure 30 indicates that the circle and the (perimeter of the) triangle are homeo- 
morphic, while Figure 15 shows that (a, 6), the semicircle, and R are homeomorphic. 



Figure 30 The circle and triangle are homeomorphic. 


A natural question that should occur to you is whether continuity of / 1 is actu- 
ally implied by continuity of a bijection /. It is not. Here is an instructive example. 

Consider the interval [0, 2n) = {x G 1 : 0 < x < 2n} and define / : [0, 2i r) — > S 1 
to be the mapping f(x) — (cosx, sinx) where S 1 is the unit circle in the plane. 
The mapping / is a continuous bijection, but the inverse bijection is not continuous. 
For there is a sequence of points (z n ) on S 1 in the fourth quadrant that converges 
to p — (1,0) from below, and f~ 1 (z n ) does not converge to f~ 1 (p) — 0. Rather it 
converges to 2n. Thus, / is a continuous bijection whose inverse bijection fails to 
be continuous. See Figure 31. In Exercises 49 and 50 you are asked to find worse 
examples of continuous bijections that are not homeomorphisms. 


To build your intuition about continuous mappings and homeomorphisms, con- 
sider the following examples shown in Figure 32 - the unit circle in the plane, a trefoil 
knot in R 3 , the perimeter of a square, the surface of a donut (the 2-torus), the surface 
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Figure 31 / wraps [0, 2 tt ) bijectively onto the circle. 


of a ceramic coffee cup, the unit interval [0, 1], the unit disc including its boundary. 
Equip all with the inherited metric. Which should be homeomorphic to which? 







Figure 32 Seven metric spaces 
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The (e, <5)-Condition 

The following theorem presents the more familiar (but equivalent!) definition 
of continuity using e and 5. It corresponds to the definition given in Chapter 1 for 
real- valued functions of a real variable. 

4 Theorem f : M ^ N is continuous if and only if it satisfies the (e, 5)- condition : 
For each e > 0 and eachp G M there exists 5 > 0 such that if x G M and dM(x,p) < 8 
then djy(fx , fp) < e. 

Proof Suppose that / is continuous. It preserves sequential convergence. From the 
supposition that / fails to satisfy the (e, 5)-condition at some p G M we will derive 
a contradiction. If the (e, 5)-condition fails at p then there exists e > 0 such that for 
each S > 0 there is a point x with d(x,p) < 5 and d(fx,fp) > e. Taking 5 = 1/n 
we get a sequence (x n ) with d(x n ,p) < 1/n and d(f(x n ),fp) > e, which contradicts 
preservation of sequential convergence. For x n — > p but f{x n ) does not converge 
to /p, which contradicts the fact that / preserves sequential convergence. Since 
the supposition that / fails to satisfy the (e, 5)-condition leads to a contradiction, / 
actually does satisfy the (e, 5)-condition. 

To check the converse, suppose that / satisfies the (e, 5)-condition at p. For each 
sequence (x n ) in M that converges to p we must show f(x n ) -A fp in N as n — > oo. 
Let e 'p > 0 be given. f _L 1 here is S 0 such that dj\z[(^x^p^j S — ^ d^ji^fx^ fp ) e. 
Convergence of x n to p implies there is an integer K such that for all n > K we have 
d M (x n ,p) < S , and therefore d/v(/(^n) 5 fp ) < e. That is, f(x n ) -G fp as n — > oo. See 
also Exercise 13. □ 


3 The Topology of a Metric Space 

Now we come to two basic concepts in a metric space theory - closedness and open- 
ness. Let M be a metric space and let S' be a subset of M. A point p G M is a limit 
of S if there exists a sequence (p n ) in S that converges to itf 

M limit of S is also sometimes called a limit point of S. Take care though: Some mathematicians 
require that a limit point of S be the limit of a sequence of distinct points of S. They would say that 
a finite set has no limit points. We will not adopt their point of view. Another word used in this 
context, especially by the French, is “adherence.” A point p adheres to the set S if and only if p 
is a limit of S. In more general circumstances, limits are defined using “nets” instead of sequences. 
They are like “uncountable sequences.” You can read more about nets in graduate-level topology 
books such as Topology by James Munkres. 
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Definition S is a closed set if it contains all its limits. ^ 


Definition S is an open set if for each p G S there exists an r > 0 such that 

d(p, q) < r => q G S. 

5 Theorem Openness is dual to closedness: The complement of an open set is a 
closed set and the complement of a closed set is an open set. 

Proof Suppose that S C M is an open set. We claim that S c is a closed set. If 
Pn — ^ P and p n G S c we must show that p G S c . Well, if p ^ S c then p G S and, since 
S is open, there is an r > 0 such that 

d(p, q) < r q G S. 

Since p n -A p, we have d(p,p n ) < r for all large n, which implies that p n G S', 
contrary to the sequence being in S c . Since the supposition that p lies in S leads to 
a contradiction, p actually does he in S c , proving that S c is a closed set. 

Suppose that S is a closed set. We claim that S c is open. Take any p G S c . If 
there fails to exist an r > 0 such that 

d(p, q) < r q G S c 

then for each r — 1/n with n — 1,2,... there exists a point p n G S such that 
d(PiPn) < 1/n. This sequence in S converges to p G S c , contrary to closedness of S. 
Therefore there actually does exist an r > 0 such that 

d(p, q) < r q G S c 

which proves that S c is an open set. □ 


Most sets, like doors, are neither open nor closed, but ajar. Keep this in mind. 
For example neither (a, b] nor its complement is closed in R; (a, b] is neither closed 
nor open. Unlike doors, however, sets can be both open and closed at the same 
time. For example, the empty set 0 is a subset of every metric space and it is always 
closed. There are no sequences and no limits to even worry about. Similarly the 
full metric space M is a closed subset of itself: For it certainly contains the limit of 

^Note how similarly algebraists use the word “closed.” A held (or group or ring, etc.) is closed 
under its arithmetic operations: Sums, differences, products, and quotients of elements in the held 
still lie in the held, tn our case it is limits. Limits of sequences in S must lie in S. 
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every sequence that converges in M . Thus, 0 and M are closed subsets of M . Their 
complements, M and 0, are therefore open: 0 and M are both closed and open. 

Subsets of M that are both closed and open are clopen. See also Exercise 125. It 
turns out that in R the only clopen sets are 0 and R. In Q, however, things are quite 
different, sets such as {r G Q : — \/2 < r < \/2} being clopen in Q. To summarize, 

A subset of a metric space can be 
closed, open, both, or neither. 

You should expect the “typical’ 1 subset of a metric space to be neither closed nor 
open. 

The topology of M is the collection T of all open subsets of M. 

6 Theorem 7 has three properties as a system it is closed under union, finite 
intersection, and it contains 0, M . That is, 

(a) Every union of open sets is an open set. 

(b) The intersection of finitely many open sets is an open set. 

(c) 0 and M are open sets. 

Proof (a) If {U a } is any collection* * of open subsets of M and V — \J U a then V is 
open. For if p G V then p belongs to at least one U a and there is an r > 0 such that 

d(p,q) < r => q e U a . 

Since U a C V, this implies that all such q lie in V, proving that V is open. 

(b) If U \, . . . , U n are open sets and W = Uk then W is open. For if p E W then 
for each fc, 1 < k < n, then there is an > 0 such that 

d(p,q) < r k => qeU k . 

Take r = min{ri, . . . , r n }. Then r > 0 and 

d{p,q) < r => qeU k , 

Mny collection T of subsets of a set X that satisfies these three properties is called a topology on 
X, and X is called a topological space. Topological spaces are more general than metric spaces: 
There exist topologies that do not arise from a metric. Think of them as pathological. The question 
of which topologies can be generated by a metric and which cannot is discussed in Topology by 
Munkres. See also Exercise 30. 

*The a in the notation U a “indexes” the sets. If a = 1,2,... then the collection is countable, but 
we are just as happy to let a range through uncountable index sets. 
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for each k\ i.e., q G W — H Uk , proving that W is open. 

(c) It is clear that 0 and M are open sets. □ 

7 Corollary The intersection of any number of closed sets is a closed set; the finite 
union of closed sets is a closed set; 0 and M are closed sets. 

Proof Take complements and use De Morgan’s laws. If {K a } is a collection of closed 
sets then U a — ( K a ) c is open and 

K = n K a = (U U a ) c . 

Since JJ U a is open, its complement K is closed. Similarly, a finite union of closed 
sets is the complement of the finite intersection of their complements, and is a closed 
set. □ 


What about an infinite union of closed sets? Generally, it is not closed. For 
example, the interval [1/n, 1] is closed in R, but the union of these intervals as n 
ranges over N is the interval (0, 1] which is not closed in R. Neither is the infinite 
intersection of open sets open in general. 

Two sets whose closedness/openness properties are basic are: 

lim S — {p G M : p is a limit of S} 

M r p — {q G M : d(p, q) < r}. 

The former is the limit set of S'; the latter is the r-neighborhood of p. 

8 Theorem lim S is a closed set and M r p is an open set. 


Proof Simple but not immediate! See Figure 33. 

Suppose that p n — > p and each p n lies in limS. We claim that p G limS. Since 
p n is a limit of S there is a sequence ( p n ,k)keN m S that converges to p n as k — > oo. 
Thus there exists q n — p n ^k(n) £ S such that 

d(p n ,q n ) < -• 

n 


Then, as n — > oo we have 


d(p,q n ) < d(p,p n ) + d(p n ,q n ) 0 


which implies that q n — > p, so p G lim S, which completes the proof that lim S is a 
closed set. 
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Qn 
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Pn,k 

• • • • 

• • • • 



Pn 


Figure 33 S — (0, 1) x (0, 1) and p n — (1/n, 0) converges to p = (0, 0) as 
n — ^ oo. Each p n is the limit of the sequence p n ^ — (1/n, 1/k) as k — > oo. 
The sequence q n — (1/n, 1/n) lies in S and converges to (0,0). Hence: The 

limits of limits are limits. 



Figure 34 Why the r-neighborhood of p is an open set 
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To check that M r p is an open set, take any q G M r p and observe that 

s — r — d(p, q ) > 0. 

By the triangle inequality, if d(q, x) < s then 

d(p , x) < d(p, q) + d(q, x ) < r, 

and hence M s q C M r p. See Figure 34. Since each q G M r p has some M s q that is 
contained in M r p, M r p is an open set. □ 


9 Corollary The interval (a, b ) is open in R and so are (— oo, b), (a, oo), and (— oo, oo). 
T/ie interval [a, 6] zs closed in R. 


Proof (a, 6) is the r-neighborhood of its midpoint m — (a + 6)/2 where r = (b — a)/ 2. 
Therefore (a, 6) is open in R. Since the union of open sets is open we see that 


U (b — n, b — 1/n) — (— oo, 6) 


is open. The same applies to (a, oo). The whole metric space R = (— oo, oo) is always 
open in itself. 


Since the complement of [a, b] is the open set (— oo, a) U (6, oo), the interval [a, b] 
is closed. □ 


10 Corollary lim S is the “ smallest ” closed set that contains S in the sense that if 
K D S and K is closed then K D lim S. 

Proof Obvious. K must contain the limit of each sequence in K that converges in 
M and therefore it must contain the limit of each sequence in S C K that converges 
in M. These limits are exactly lim S'. □ 


We refer to limS as the closure of S and denote it also as S. You start with S 
and make it closed by adding all its limits. You don’t need to add any more points 
because limits of limits are limits. That is, lim(limS) = limS. An operation like 
this is called idempotent. Doing the operation twice produces the same outcome as 
doing it once. 


A neighborhood of a point p in M is any open set V that contains p. Theorem 8 
implies that V — M r p is a neighborhood of p. Eventually, you will run across the 
phrase “closed neighborhood” of p, which refers to a closed set that contains an open 
set that contains p. However, until further notice all neighborhoods are open. 
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Usually, sets defined by strict inequalities are open while those defined by equal- 
ities or nonstrict inequalities are closed. Examples of closed sets in R are finite sets, 
a, 6], N, and the set {0} U {1 jn : n G IV}. Each contains all its limits. Examples of 
open sets in R are open intervals, bounded or not. 

Topological Description of Continuity 

A property of a metric space or of a mapping between metric spaces that can 
be described solely in terms of open sets (or equivalently, in terms of closed sets) is 
called a topological property. The next result describes continuity topologically. 



SI 2 3£ 


Figure 35 The function / : (x, y) x 2 + y 2 + 2 and its graph over the 

preimage of [3, 6] 

Let / : M -T TV be given. The preimage^ of a set V C N is 

/ pre (U) = {peM: f(p ) G V}. 

For example, if / : R 2 — > R is the function defined by the formula 

f(x,y) =x 2 + y 2 + 2 

then the preimage of the interval [3, 6] in R is the annulus in the plane with inner 
radius 1 and outer radius 2. Figure 35 shows the domain of / as R 2 and the target 

Mhe preimage of V is also called the inverse image of V and is denoted by / - 1 (E). Unless / 
is a bijection, this notation leads to confusion. There may be no map / -1 and yet expressions like 

V D /(/ - 1 (E)) are written that mix maps and nonmaps. By the way, if / sends no point of M into 

V then / pre (U) is the empty set. 
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as R. The range is the set of real numbers > 2. The graph of / is a paraboloid 
with lowest point (0,0,2). The second part of the figure shows the portion of the 
graph lying above the annulus. You will find it useful to keep in mind the distinctions 
among the concepts: function, range, and graph. 

11 Theorem The following are equivalent for continuity of f : M — > TV. 

(i) The (e, 5) -condition. 

(i%) The sequential convergence preservation condition. 

(Hi) The closed set condition: The preimage of each closed set in TV is closed in 
M. 

(iv) The open set condition: The preimage of each open set in TV is open in M . 

Proof Totally natural! By Theorem 4, (i) implies (ii). 

(ii) implies (iii). If K C TV is closed in TV and p n G f pTe (K) converges top G M 
then we claim that p G f pTe (K). By (ii), / preserves sequential convergence so 

lim f(Pn ) = fP- 

n— ^oo 

Since K is closed in TV, fp G iL, so p G / pre (lG), and we see that f pTe (K) is closed in 
M. Thus (ii) implies (iii). 

(iii) implies (iv). This follows by taking complements: (/ pre (C)) c = / pre (C/ c ). 

(iv) implies (i). Let e > 0 and p G M be given. N e (fp) is open in TV, so its 
preimage U — / pre (TV e (/p)) is open in M. The point p belongs to the preimage so 
openness of U implies there is a S > 0 such that M§(p ) C U . Then 

f(M 5 (p)) C fU C N e (fp) 

gives the e, 5 condition, dM(p,x) <8 => djy(fp,fx) < e. See Figure 36. □ 


I hope you find the closed and open set characterizations of continuity elegant. 
Note that no explicit mention is made of the metric. The open set condition is purely 
topological. It would be perfectly valid to take as a definition of continuity that the 
preimage of each open set is open. In fact this is exactly what’s done in general 
topology. 

12 Corollary A homeomorphism f : M — >> TV bijects the collection of open sets in 
M to the collection of open sets in TV. It bijects the topologies. 
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Figure 36 The e, 5 - condition for a continuous function / : M — > TV 

Proof Let V be an open set in N. By Theorem 11, since / is continuous, the preimage 
of V is open in M. Since / is a bijection, this preimage U — {p G M : fp G V} is 
exactly the image of V by the inverse bijection, U — / -1 (F). The same thing can be 
said about / -1 since / -1 is also a homeomorphism. That is, V — fU. Thus, sending 
U to fU bijects the topology of M to the topology of N. □ 

Because of this corollary, a homeomorphism is also called a topological equiv- 
alence. 

In general, continuous maps do not need to send open sets to open sets. For 
example, the squaring map x i— > x 1 from R to R is continuous but it sends the open 
interval (—1,1) to the nonopen interval [0, 1). See also Exercise 28. 

Inheritance 

If a set S is contained in a metric subspace N C M you need to be careful when 
you say that S is open or closed. For example, 

S — {x G Q : —7 r < x < n} 

is a subset of the metric subspace QcR. It is both open and closed with respect to 
Q but is neither open nor closed with respect to R. To avoid this kind of ambiguity 
it is best to say that S is clopen “with respect to Q but not with respect to R,” or 
more briefly that S is clopen “in Q but not in R.” Nevertheless there is a simple 
relation between the topologies of M and N when IV is a metric subspace of M . 
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13 Inheritance Principle Every metric subspace N of M inherits its topology 

from M in the sense that each subset V C N which is open in N is actually the 
intersection V — N P\U for some U C M that is open in M , and vice versa. 

Proof It all boils down to the fact that for each p <E N we have 

N r p — N n M r p. 

After all, N r p is the set of x G N such that djy{x,p) < r and this is exactly the 
same as the set of those x G M r p that belong to N. Therefore N inherits its r- 
neighborhoods from M . Since its open sets are unions of its r-neighborhoods, N also 
inherits its open sets from M . 

In more detail, if V is open in N then it is the union of those N r p with N r p C V . 
Each such N r p is N n M r p and the union of these M r p is [/, an open subset of 
M. The intersection N D U equals V. Conversely, if U is any open subset of M 
and p G V — N n U then openness of U implies there is an M r p C U. Thus 
N r p — N n M r p C V, which shows that V is open in N. □ 

14 Corollary Every metric subspace of M inherits its closed sets from M . 

Proof By “inheriting its closed sets” we mean that each closed subset L C N is the 
intersection NEK for some closed subset K C M and vice versa. The proof consists 
of two words: “Take complements.” See also Exercise 34. □ 


Let’s return to the example with Q C R and S — {x G Q : — tv < x < tt}. The 
set S is clopen in Q, so we should have S — Q H U for some open set U C R and 
S — Q n K for some closed set K Ci In fact U and K are 


U — (— 7r,7r) and K 


— TT, 7 r 


15 Corollary Assume that N is a metric subspace of M and also is a closed subset 
of M . A set L C N is closed in N if and only if it is closed in M . Similarly , if N is 
a metric subspace of M and also is an open subset of M then U C N is open in N if 
and only if it is open in M . 


Proof The proof is left to the reader as Exercise 34. 


□ 
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Product Metrics 

We next define a metric on the Cartesian product M — X x Y of two metric 
spaces. There are three natural ways to do so: 

dE{p,p') = \J d x (x, x') 2 + d Y {y, y') 2 
dmax (p,p 7 ) = ma x{d x (x,x f ), d Y (y,y')} 
d S um{p,p) = d x (x,x') + d Y {y,y') 

where p — (x,y) and p' — (x f ,y f ) belong to M. (cIe is the Euclidean product 
metric.) The proof that these expressions actually define metrics on M is left as 
Exercise 38. 

16 Proposition d max T dE Yi d sum T 2d max . 

Proof Dropping the smaller term inside the square root shows that d max < d#; 
comparing the square of dE and the square of d sum shows that the latter has the 
terms of the former and the cross term besides, so dE < d sum ; and clearly d sum is no 
larger than twice its greater term, so d sum < 2 d max . □ 

17 Convergence in a Product Space The following are equivalent for a sequence 
Vn = (pin,P2n) in M = Mi x M 2 : 

(&) ( Pn ) converges with respect to the metric d max . 

(b) ( p n ) converges with respect to the metric dE . 

(c) (p n ) converges with respect to the metric d sum . 

(d) {pin) and {p 2n ) converge in Mi and M 2 respectively. 

Proof This is immediate from Proposition 16. □ 

18 Corollary If f : M — >> N and g : X — > Y are continuous then so is their 

Cartesian product f x g which sends (p, x) G M x X to {fp^gx) e N xY. 

Proof If {p ni x n ) — > (p,x) in M x X then Theorem 17 implies p n — > p in M and 
x n — > x in X. By continuity, f(p n ) — > fp and g{x n ) — > gx. By Theorem 17, 
(/ {Pn ) 5 d{x n )) (fp,gx) which gives continuity of f x g. □ 


The three metrics make sense in the obvious way for a Cartesian product of m > 3 
metric spaces. The inequality 




is proved in the same way, and we see that Theorem 17 holds also for the product of 
m metric spaces. This gives 
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19 Corollary (Convergence in R m ) A sequence of vectors (v n ) in R m converges 
in R m if and only if each of its component sequences converges , 1 < i < m. The 
limit of the vector sequence is the vector 

v = lim v n = ( lim v\ n , lim v 2n -> • • • , hm v mn ) . 

n— )• oo \n—> oo n— )>oo n—> oo / 


The distance function d : M x M — ^ R is a function from the metric space M x M 
to the metric space R, so the following assertion makes sense. 

20 Theorem d is continuous. 

Proof We check (e, 5)-continuity with respect to the metric d s um . Given e > 0 we 
take 5 — e. If d sum ((p, q), (p 7 , q')) < 5 then the triangle inequality gives 

d(p , q) < d(p , p') + d(p\ q') + d(q\ q) < d(p ' , q') + e 
d(p\ q') < d(p p) + d(p, g) + d(g, g') < d(p, g) + e 

which gives 

d(p, q)-e < d(p / , g') < d(p, g) + e. 

Thus ^(p 7 ,^ 7 ) — d(p,q)\ < e and we get continuity with respect to the metric d S um- 
By Theorem 17 it does not matter which metric we use on R x R. □ 


As you can see, the use of d s um simplifies the proof by avoiding square root 
manipulations. The sum metric is also called the Manhattan metric or the taxicab 
metric. Figure 37 shows the “unit discs” with respect to these metrics in R 2 . See 
also Exercise 2. 

21 Corollary The metrics d mdiX , dE, and d s um are continuous. 

Proof Theorem 20 asserts that all metrics are continuous. □ 


22 Corollary The absolute value is a continuous mapping R — > R. In fact the norm 
is a continuous mapping from any normed space to R. 


v 


Proof 


d(v, 0). 


□ 
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( 0 . 1 ) 



(1.0) 


Figure 37 The unit disc in the max metric is a square, and in the sum 

metric it is a rhombus. 


Completeness 

In Chapter 1 we discussed the Cauchy criterion for convergence of a sequence of 
real numbers. There is a natural way to carry these ideas over to a metric space M. 
The sequence (p n ) in M satisfies a Cauchy condition provided that for each e > 0 
there is an integer N such that for all > N we have d(pk,p n ) < e, and (p n ) is 
said to be a Cauchy sequence. In symbols, 

Ve > 0 3N such that k,n > N => d(pk,Pn) < e. 

The terms of a Cauchy sequence “bunch together” as n — > oo. Each convergent 
sequence (p n ) is Cauchy. For if (p n ) converges to p as n — > oo then, given e > 0, there 
is an N such that for all n > N we have 


d(Pn,p) < 7 

By the triangle inequality, if fc, n > N then 

d(pk,Pn) < d(pk,p) + d(p,p n ) < e, 
so convergence => Cauchy. 

Theorem 1.5 states that the converse is true in the metric space R. Every Cauchy 
sequence in R converges to a limit in R. In the general metric space, however, this 
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need not be true. For example, consider the metric space Q of rational numbers, 
equipped with the inherited metric d(x,y) = \x — y|, and consider the sequence 

(r n ) = (1.4, 1.41, 1.414, 1.4142, ...). 

It is Cauchy. Given e > 0, choose N > — log 10 e. If fc,n > N then | — r n \ < 
IQ-^v < e Nevertheless, (r n ) refuses to converge in Q. After all, as a sequence in 
R it converges to and if it also converges to some r G Q, then by uniqueness of 
limits in R we have r — y2, something we know is false. In brief, convergence 
Cauchy but not conversely. 

A metric space M is complete if each Cauchy sequence in M converges to a limit 
in M. Theorem 1.5 states that R is complete. 

23 Theorem R m is complete. 

Proof Let (p n ) be a Cauchy sequence in R m . Express p n in components as 

Pn — ( Pin 5 • • • 5 Pmn )• 

Because (p n ) is Cauchy, each component sequence (pi n )ne N is Cauchy. Complete- 
ness of R implies that the component sequences converge, and therefore the vector 
sequence converges. □ 

24 Theorem Every closed subset of a complete metric space is a complete metric 
subspace. 

Proof Let A be a closed subset of the complete metric space M and let (p n ) be a 
Cauchy sequence in A with respect to the inherited metric. It is of course also a 
Cauchy sequence in M and therefore it converges to a limit p in M . Since A is closed 
we have p G A. □ 


25 Corollary Every closed subset of Euclidean space is a complete metric space. 
Proof Obvious from the previous theorem and completeness of R m . 


□ 


Remark Completeness is not a topological property. For example, consider R with 
its usual metric and (— 1 , 1 ) with the metric it inherits from R. Although they are 
homeomorphic metric spaces, R is complete but (— 1 , 1 ) is not. 


In Section 10 we explain how every metric space can be completed. 
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4 Compactness 

Compactness is the single most important concept in real analysis. It is what reduces 
the infinite to the finite. 

Definition A subset A of a metric space M is (sequentially) compact if every 
sequence (a n ) in A has a subsequence (a nk ) that converges to a limit in A. 


The empty set and finite sets are trivial examples of compact sets. For a sequence 
(a n ) contained in a finite set repeats a term infinitely often, and the corresponding 
constant subsequence converges. 

Compactness is a good feature of a set. We will develop criteria to decide whether 
a set is compact. The first is the most often used, but beware! - its converse is 
generally false. 

26 Theorem Every compact set is closed and bounded. 


Proof Suppose that A is a compact subset of the metric space M and that p is 
a limit of A. Does p belong to A? There is a sequence (a n ) in A converging to 
p. By compactness, some subsequence (a nk ) converges to some q G A, but every 
subsequence of a convergent sequence converges to the same limit as does the mother 
sequence, so q — p and p G A. Thus A is closed. 

To see that A is bounded, choose and fix any point p G M. Either A is bounded 
or else for each n G N there is a point a n G A such that d(p, a n ) > n. Compactness 
implies that some subsequence (a nk ) converges. Convergent sequences are bounded, 
which contradicts the fact that d(p, a nk ) — > oo as k -G oo. Therefore (a n ) cannot exist 
and for some large r we have A C M r p, which is what it means that A is bounded. □ 


27 Theorem The closed interval [a, b] C R is compact. 


Proof Let (x n ) be a sequence in [a, b] and set 


C — {x G [a, b\ : x n < x only finitely often}. 


Equivalently, for all but finitely many n, x n > x. Since a G C we know that (7^0. 
Clearly b is an upper bound for C. By the least upper bound property of R there 
exists c = 1. u. b. C with c G [a, b\. We claim that a subsequence of (x n ) converges to 
c. Suppose not, i.e., no subsequence of (x n ) converges to c. Then for some r > 0, x n 
lies in (c — r, c + r) only finitely often, which implies that c + r G C, contrary to 
c being an upper bound for C. Hence some subsequence of (x n ) does converge to c 
and [a, b] is compact. □ 
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To pass from R to R m we think about compactness for Cartesian products. 

28 Theorem The Cartesian product of two compact sets is compact. 

Proof Let (a n , b n ) G A x B be given where A C M and B C N are compact. There 
exists a subsequence (a nk ) that converges to some point a G A as k — > oo. The 
subsequence (b nk ) has a sub-subsequence (b nk ^) that converges to some b G B as 
£ — > oo. The sub-subsequence (a nk ^) continues to converge to the point a. Thus 

i a n k ( £ y bn k ^) — > (tt, 6) 

as ^ oo. This implies that A x B is compact. □ 


29 Corollary The Cartesian product of m compact sets is compact. 

Proof Write A\ x A^ x • • • x Am — A\ x (A 2 x • • • x A m ) and perform induction on 
m. (Theorem 28 handles the bottom case m — 2.) □ 


30 Corollary Every box x • • • x [a m ,6 m ] m R m is compact. 


Proof Obvious from Theorem 27 and the previous corollary. 


□ 


An equivalent formulation of these results is the 

31 Bolzano- Weierstrass Theorem Every bounded sequence in R m has a conver- 
gent subsequence. 

Proof A bounded sequence is contained in a box, which is compact, and therefore 
the sequence has a subsequence that converges to a limit in the box. See also Exer- 
cise 11. □ 


Here is a simple fact about compacts. 

32 Theorem Every closed subset of a compact is compact. 

Proof If A is a closed subset of the compact set K and if (a n ) is a sequence of points 
in A then clearly (a n ) is also a sequence of points in iL, so by compactness of K there 
is a subsequence (a nk ) converging to a limit p G K. Since A is closed, p lies in A 
which proves that A is compact. □ 


Now we come to the first partial converse to Theorem 26. 
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33 Heine-Borel Theorem Every closed and bounded subset ofW 71 is compact. 

Proof Let A C R m be closed and bounded. Boundedness implies that A is contained 
in some box, which is compact. Since A is closed, Theorem 32 implies that A is 
compact. See also Exercise 11. □ 

The Heine-Borel Theorem states that closed and bounded subsets of Euclidean 
space are compact, but it is vital t to remember that a closed and bounded subset 
of a general metric space may fail to be compact. For example, the set N of natural 
numbers equipped with the discrete metric is a complete metric space, it is closed in 
itself (as is every metric space), and it is bounded. But it is not compact. After all, 
what subsequence of 1, 2, 3, . . . converges? 

A more striking example occurs in the metric space C([0, 1],R) whose metric is 
d(f,g) — max{| f{x) — g(x)\}. The space is complete but its closed unit ball is not 
compact. For example, the sequence of functions f n — x n has no subsequence that 
converges with respect to the metric d. In fact every closed ball is noncompact. 

Ten Examples of Compact Sets 

1. Any finite subset of a metric space, for instance the empty set. 

2. Any closed subset of a compact set. 

3. The union of finitely many compact sets. 

4. The Cartesian product of finitely many compact sets. 

5. The intersection of arbitrarily many compact sets. 

6. The closed unit ball in R 3 . 

7. The boundary of a compact set, for instance the unit 2-sphere in R 3 . 

8. The set {x G R : 3n G N and x — 1 /n} U {0}. 

9. The Hawaiian earring. See page 58. 

10. The Cantor set. See Section 8. 

Nests of Compacts 

If A\ D A 2 D • • • D A n D A n+ 1 D . . . then (A n ) is a nested sequence of sets. 
Its intersection is 

(X) 

A n — {p : for each n we have p G A n }. 

n=l 

T have asked variants of the following True or False question on every analysis exam I’ve given: 
“Every closed and bounded subset of a complete metric space is compact.” You would be surprised 
at how many students answer “True.” 
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See Figure 38. 



Figure 38 A nested sequence of sets 


For example, we could take A n to be the disc {z G R 2 : |z| < 1/n}. The intersec- 
tion of all the sets A n is then the singleton {0}. On the other hand, if A n is the ball 


{z G 




< 1 + 1/n} then A n is the closed unit ball B 3 . 


34 Theorem The intersection of a nested sequence of compact nonempty sets is 
compact and nonempty. 


Proof Let (A n ) be such a sequence. By Theorem 26, A n is closed. The intersection 
of closed sets is always closed. Thus, [\A n is a closed subset of the compact set A\ 
and is therefore compact. It remains to show that the intersection is nonempty. 

A n is nonempty, so for each n G N we can choose a n G A n . The sequence (a n ) 
lies in A\ since the sets are nested. Compactness of A\ implies that (a n ) has a 
subsequence (a nk ) converging to some point p G A\. The limit p also lies in the set 
A 2 since except possibly for the first term, the subsequence (a nk ) lies in A 2 and A 2 
is a closed set. The same is true for A3 and for all the sets in the nested sequence. 
Thus, p G n A n and A n is shown to be nonempty. □ 


The diameter of a nonempty set S C M is the supremum of the distances d(x, y) 
between points of S. 
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35 Corollary If in addition to being nested , nonempty , and compact, the sets A n 
have diameter that tends to 0 as n — >> oo then A = f\An is a single point. 

Proof For each n G N, A is a subset of A n , which implies that A has diameter zero. 
Since distinct points lie at positive distance from each other, A consists of at most one 
point, while by Theorem 34 it consists of at least one point. See also Exercise 52. □ 



Figure 39 This nested sequence has empty intersection. 

Figure 39 shows a nested sequence of nonempty noncompact sets with empty in- 
tersection. They are the open discs with center (1/n, 0) on the x-axis and radius 1/n. 
They contain no common point. (Their closures do intersect at a common point, the 
origin.) 

Continuity and Compactness 

Next we discuss how compact sets behave under continuous transformations. 

36 Theorem If f : M — >> N is continuous and A is a compact subset of M then f A 
is a compact subset of N. That is, the continuous image of a compact is compact. 

Proof Suppose that (b n ) is a sequence in f A. For each n G N choose a point a n G A 
such that f(a n ) = b n . By compactness of A there exists a subsequence (a nk ) that 
converges to some point p G A. By continuity of / it follows that 

K k = f( a n k ) ->■ fp e / A 
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as k — > oo. Thus, every sequence (b n ) in f A has a subsequence converging to a limit 
in /A, which shows that f A is compact. □ 

From Theorem 36 follows the natural generalization of the min/max theorem in 
Chapter 1 which concerns continuous real- valued functions defined on an interval 
[a, b\. See Theorem 1.23. 

37 Corollary A continuous real-valued function defined on a compact set is bounded ; 
it assumes maximum and minimum values. 

Proof Let f : M R be continuous and let A be a compact subset of M . Theo- 
rem 36 implies that /A is a compact subset of R, so by Theorem 26 it is closed and 
bounded. Thus, the greatest lower bound, u, and least upper bound, V, of /A exist 
and belong to /A; i.e., there exist points p,P G A such that for all a G A we have 
v = fp < fa < fP = V. □ 


Homeomorphisms and Compactness 

A homeomorphism is a bicontinuous bijection. Originally, compactness was called 
bicompactness. This is reflected in the next theorem. 

38 Theorem If M is compact and M is homeomorphic to N then N is compact. 
Compactness is a topological property. 

Proof N is the continuous image of M, so by Theorem 36 it is compact. □ 

39 Corollary [0, 1] and R are not homeomorphic. 

Proof One is compact and the other isn’t. □ 

40 Theorem If M is compact then a continuous bijection f : M — > TV is a homeo- 
morphism - its inverse bijection / -1 : N — >> M is automatically continuous. 

Proof Suppose that q n q in N . Since / is a bijection, p n — f~ l {q n ) and p — / _1 (g) 
are well defined points in M . To check continuity of / -1 we must show that p n p. 

If (p n ) refuses to converge to p then there is a subsequence (p nk ) and a S > 0 such 
that for all k we have d(p Uk ,p) > S. Compactness of M gives a sub-subsequence 
( Pn k (€) ) that converges to a point p* G M as t — > oo. 

Necessarily, d(p,p*) > 4, which implies that p p*. Since / is continuous we 
have 


f(Pn k(e) ) f(p*) 
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as £ — > oo. The limit of a convergent sequence is unchanged by passing to a subse- 
quence, and so f{p nk(e) ) = Qn k(e) q as £ ^ oo. Thus, f(p*) =q = f(p ), contrary to 
/ being a bijection. It follows that p n — > p and therefore that / _1 is continuous. □ 

If M is not compact then Theorem 40 becomes false. For example, the function 
/ : [0, 2tt) R 2 dehned by /(x) = (cosx, sinx) is a continuous bijection onto the 
unit circle in the plane, but it is not a homeomorphism. This useful example was 
discussed on page 65. Not only does this / fail to be a homeomorphism, but there 
is no homeomorphism at all from [0, 2n) to S 1 . The circle is compact while [0, 2n) is 
not. Therefore they are not homeomorphic. See also Exercises 49 and 50. 

Embedding a Compact 

Not only is a compact space M closed in itself, as is every metric space, but it 
is also a closed subset of each metric space in which it is embedded. More precisely 
we say that h : M — > N embeds M into N if h is a homeomorphism from M onto 
hM . (The metric on hM is the one it inherits from N.) Topologically M and hM 
are equivalent. A property of M that holds also for every embedded copy of M is an 
absolute or intrinsic property of M . 

41 Theorem A compact is absolutely closed and absolutely bounded. 

Proof Obvious from Theorems 26 and 36. □ 

For example, no matter how the circle is embedded in R 3 , its image is always 
closed and bounded. See also Exercises 48 and 120. 

Uniform Continuity and Compactness 

In Chapter 1 we dehned the concept of uniform continuity for real- valued functions 
of a real variable. The definition in metric spaces is analogous. A function / : M — > N 
is uniformly continuous if for each e > 0 there exists a S > 0 such that 

p,q G M and dM(p,q) < 8 d]y(fp,fq) < e. 

42 Theorem Every continuous function defined on a compact is uniformly contin- 
uous. 

Proof Suppose not, and / : M — > TV is continuous, M is compact, but / fails to 
be uniformly continuous. Then there is some e > 0 such that no matter how small 
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6 is, there exist points p, q G M with d(p,q) < 5 but d(fp,fq) > e. Take 5 — 1/n 
and let (p n ) and (q n ) be sequences of points in M such that d(p n ,q n ) < 1/n while 
d(f(Pn ), f(Qn)) > Compactness of M implies that there is a subsequence p nk which 
converges to some p G M as k — >> oo. Since d(p n ,q n ) < 1/n — >> 0 as n — >> oo, (q nk ) 
converges to the same limit as does (p nk ) as k — > oo; namely q Uk — > p. Continuity at 
p implies that f(p nk ) fp and f(q nk ) fp- If k is large then 

d(f(Pn k ),f(q nk )) < d(f(p nk ),fp ) + d(fp,f(q nk )) < e, 
contrary to the supposition that d(f(p n ), f(q n )) A e for all n. □ 


Theorem 42 gives a second proof that continuity implies uniform continuity on an 
interval [a, 6]. For [a, 6] is compact. 


5 Connectedness 

As another application of these ideas, we consider the general notion of connectedness. 
Let A be a subset of a metric space M. If A is neither the empty set nor M then A 
is a proper subset of M. Recall that if A is both closed and open in M it is said to 
be clopen. The complement of a clopen set is clopen. The complement of a proper 
subset is proper. 

If M has a proper clopen subset A then M is disconnected. For there is a 
separation of M into proper, disjoint clopen subsets, 

M = A U A c . 


(The notation U indicates disjoint union.) M is connected if it is not disconnected, 
i.e. , it contains no proper clopen subset. Connectedness of M does not mean that M 
is connected to something, but rather that M is one unbroken thing. See Figure 40. 



Figure 40 M and N illustrate the difference between being connected and 

being disconnected. 
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43 Theorem If M is connected , f : M N is continuous , and f is onto then N is 
connected. The continuous image of a connected is connected. 

Proof Simple! If A is a clopen proper subset of N then, according to the open and 
closed set conditions for continuity, f pre (A ) is a clopen subset of M . Since / is onto 
and A ^ 0, we have f pre (A) ^ 0. Similarly, / pre (A c ) ^ 0. Therefore / pre (A) is a 
proper clopen subset of M, contrary to M being connected. It follows that A cannot 
exist and that N is connected. □ 

44 Corollary If M is connected and M is homeomorphic to N then N is connected. 
Connectedness is a topological property. 

Proof N is the continuous image of M, so Theorem 43 implies it is connected. □ 

45 Corollary (Generalized Intermediate Value Theorem) Every continuous 
real-valued function defined on a connected domain has the intermediate value prop- 
erty. 

Proof Assume that / : M — >► R is continuous and M is connected. If / assumes 
values a < (3 in R and if it fails to assume some value 7 with a < 7 < /?, then 

M — {x G M : f{x) < 7} U {x G M : f(x) > 7} 

is a separation of M, contrary to connectedness. □ 

46 Theorem R is connected. 

Proof If U C R is nonempty and clopen we claim that U — R. Choose some p G U 
and consider the set 

A = {x G U : the open interval (p,x) is contained in U}. 

X is nonempty since U is open. Let s be the supremum of X. If s is finite (i.e. , X is 
bounded above) then s = 1. u. b. X and s is a limit of X. Since Ac U and U is closed 
we have s G U. Since U is open there is an interval (s — r, s + r) C U , which gives 
an interval (p, s + r) C [/, contrary to s being an upper bound for A. Hence s — oo 
and U D (p, oo). The same reasoning gives U C (— oo,p), so U — R as claimed. Thus 
there are no proper clopen subsets of R and R is connected. □ 

47 Corollary (Intermediate Value Theorem for R) Every continuous function 
f : R — >> R has the intermediate value property. 
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Proof Immediate from the Generalized Intermediate Value Theorem and connect- 
edness of R. □ 


48 Corollary The following metric spaces are connected: The intervals (a, 6), [a, b] 
the circle , and all capital letters of the Roman alphabet. 


Proof The interval (a, b ) is homeomorphic to R, while [a, b] is the continuous image 
of R under the map whose graph is shown in Figure 41. The circle is the continuous 
image of R under the map t (cost, sint). Connectedness of the letters A, . . . , Z is 
equally clear. □ 


f(x) = h 



b 


Figure 41 The function / surjects R continuously to 



Connectedness properties give a good way to distinguish nonhomeomorphic sets. 

Example The union of two disjoint closed intervals is not homeomorphic to a single 
interval. One set is disconnected and the other is connected. 


Example The closed interval [a, b] is not homeomorphic to the circle S 1 . For removal 
of a point x G (a, b) disconnects [a, b] while the circle remains connected upon removal 
of any point. More precisely, suppose that h : [a, b] S 1 is a homeomorphism. 
Choose a point x G (a, 6), and consider X — [a, b] \ {x}. The restriction of h to X is 
a homeomorphism from X onto V, where Y is the circle with the point hx removed. 
But X is disconnected while Y is connected. Hence h cannot exist and the segment 
is not homeomorphic to the circle. 


Example The circle is not homeomorphic to the figure eight. Removing any two 
points of the circle disconnects it, but this is not true of the figure eight. Or, removing 
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the crossing point disconnects the figure eight but removing any point of the circle 
leaves it connected. 

Example The circle is not homeomorphic to the disc. For removing two points 
disconnects the circle but does not disconnect the disc. 


As you can see, it is useful to be able to recognize disconnected subsets S' of a 
metric space M . By definition, S is a disconnected subset of M if it is disconnected 
when considered in its own right as a metric space with the metric it inherits from 
M; i.e. , it has a separation S = A U B such that A and B are proper clopen subsets 
of S. The sets A, B are separated in S but they need not be separated in M. Their 
closures in M may intersect. 

Example The punctured interval X — [a, b] \ {c} is disconnected if a < c < b. For 
X — [a, c) U (c, b] is a separation of X . The closures of the two sets with respect to 
the metric space X do not intersect, even though their closures with respect to R 
do intersect. Pay attention to this phenomenon which is related to the Inheritance 
Principle. 


Example Any subset Y of the punctured interval is disconnected if it meets both 
[a, c) and (c, 6]. For Y — ([a, c)fl7)U ((c, b] n Y) is a separation of Y . 


49 Theorem The closure of a connected set is connected. More generally, if S C M 
is connected and S C T C S then T is connected. 


Proof It is equivalent to show that if T is disconnected then S is disconnected. 
Disconnectedness of T implies that 


T = AUB 

where A, B are clopen and proper in T. It is natural to expect that 

S = KUL 

is a separation of S where K — An S and L — B n S . The sets K and L are disjoint, 
their union is S, and by the Inheritance Principle they are clopen. But are they 
proper? 

If K = 0 then A C S c . Since A is proper there exists p G A. Since A is open in 
T, there exists a neighborhood M r p such that 


T n M r p c A c S c . 
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The neighborhood M r p contains no points of S', which is contrary to p belonging to 
S. Thus, it / 0. Similarly, L = H S 7^ 0, so S — K U L is a separation of S, 
proving that S is disconnected. □ 

Example The outward spiral expressed in polar coordinates as 

S — {(r, 6) : (1 — r)6 — 1 and 9 > tt/ 2} 

has S = S U S 1 , where S 1 is the unit circle. Since S is connected, so is S. (Recall 
that S is the closure of S.) See Figure 27. 

50 Theorem The union of connected sets sharing a common point p is connected. 

Proof Let S = (J S ai where each S a is connected and p G H Sa- If S is disconnected 
then it has a separation S = A U A c where A , A c are proper and clopen. One of 
them contains p; say it is A. Then A n S a is a nonempty clopen subset of S a . Since 
S a is connected, A D S a = S a for each a, and A — S. This implies that A c — 0, a 
contradiction. Therefore S is connected. □ 

Example The 2-sphere S' 2 is connected. For S' 2 is the union of great circles, each 
passing through the poles. 

Example Every convex set C in R m (or in any metric space with a compatible linear 
structure) is connected. If we choose a point p G C then each q G C lies on a line 
segment [p, q\ C C. Thus, C is the union of connected sets sharing the common point 
p. It is connected. 

Definition A path joining p to q in a metric space M is a continuous function 
/ : [a, b\ — >> M such that fa — p and fb — q. If each pair of points in M can be joined 
by a path in M then M is path-connected. See Figure 42. 


51 Theorem Path- connected implies connected. 


Proof Assume that M is path-connected but not connected. Then M — A U A c for 
some proper clopen A C M . Choose p G A and q G A c . There is a path / : [a, b\ M 
from p to q. The separation / pre (A) U / pre (A c ) contradicts connectedness of [a, b\. 
Therefore M is connected. □ 


Example All connected subsets of R are path-connected. See Exercise 67. 
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Figure 42 A path f in M that joins p to q 


Example Every open connected subset of W n is path-connected. See Exercises 61 
and 66. 

Example The topologist’s sine curve is a compact connected set that is not 
path-connected. It is M — G U Y where 

G — {(x,y) E M 2 : y — sinl/x and 0 < x < 1 / tt} 

Y = {(0,y) E M 2 : -1 < y < 1}. 

See Figure 43. The metric on M is just Euclidean distance. Is M connected? Yes! 



0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 


Figure 43 The topologist’s sine curve M is a connected set. It includes the 

vertical segment Y at x — 0. 

The graph G is connected and M — G. By Theorem 49 M is connected. 
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6 Other Metric Space Concepts 

Here are a few standard metric space topics related to what appears above. If S C M 
then its closure is the smallest closed subset of M that contains S', its interior is the 
largest open subset of M contained in S, and its boundary is the difference between 
its closure and its interior. Their notations are 

S = clS = closure of S intS = interior of S dS — boundary of S. 

To avoid inheritance ambiguity it would be better (but too cumbersome) to write 
cl m S, intM S, and 8m S to indicate the ambient space M. In Exercise 95 you are 
asked to check various simple facts about them, such as S = lim S = the intersection 
of all closed sets that contain S. 

Clustering and Condensing 

Two concepts similar to limits are clustering and condensing. The set S “clusters” 
at p (and p is a cluster point ^ of S) if each M r p contains infinitely many points 
of S. The set S condenses at p (and p is a condensation point of S) if each 
M r p contains uncountably many points of S. Thus, S limits at p, clusters at p, or 
condenses at p according to whether each M r p contains some, infinitely many, or 
uncountably many points of S. See Figure 44. 



Figure 44 Limiting, clustering, and condensing behavior 


^Cluster points are also called accumulation points. As mentioned above, they are also some- 
times called limit points, a usage that conflicts with the limit idea. A finite set S has no cluster 
points, but of course, each of its points p is a limit of S since the constant sequence (p,p,p, . . .) 
converges to p. 
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52 Theorem The following are equivalent conditions to S clustering at p. 

(i) There is a sequence of distinct points in S that converges to p. 

(a) Each neighborhood of p contains infinitely many points of S. 

(in) Each neighborhood of p contains at least two points of S. 

(iv) Each neighborhood of p contains at least one point of S other than p. 


Proof Clearly (i) (ii) => (iii) 
remains to check (iv) => (i). 


(iv), and (ii) is the definition of clustering. It 


Assume (iv) is true: Each neighborhood of p contains a point of S other than 
p. In M\p choose a point p\ G (S \ {p}). Set r 2 = min(l/2, d(pi,p)), and in 
the smaller neighborhood M r2 p, choose p 2 G (S \ {p}). Proceed inductively: Set 
r n — min(l/n, d(p n _i,p)) and in M rn p, choose p n G (S \ {p}). Since r n -G 0 the 
sequence (p n ) converges to p. The points p n are distinct since they have different 
distances to p, 


d(pup) > r 2 > d(p 2 ,p) > ^3 > d(P3,p) > .... 

Thus (iv) => (i) and the four conditions are equivalent. 


□ 


Condition (iv) is the form of the definition of clustering most frequently used, 
although it is the hardest to grasp. It is customary to denote by S' the set of cluster 
points of S. 

53 Proposition S U S' — S. 

Proof A cluster point is a type of limit of S', so S' C limS = S and 

SUS' c s 

On the other hand, if p G S then either p G S or else p ^ S and each neighborhood 
of p contains points of S other than p. This implies that p G S U S) so S C S U S) 
and the two sets are equal. □ 

54 Corollary S is closed if and only if S' C S. 

Proof S is closed if and only if S = S. Since S = SuS, equivalent to S' C S is 

S = S. □ 


55 Corollary The least upper bound and greatest lower bound of a nonempty bounded 
set S C R belong to the closure of S. Thus, if S is closed then they belong to S. 


Proof If b — 1. u. b. S then each interval (6 — r, b] 
true for intervals [a, a + r) where a = g. 1. b. S 


contains points of S. The same is 

□ 
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Perfect Metric Spaces 

A metric space M is perfect if M' — M, i.e., each p G M is a cluster point of 
M. Recall that M clusters at p if each M r p is an infinite set. For example [a, b] is 
perfect and Q is perfect. N is not perfect since none of its points are cluster points. 

56 Theorem Every nonempty, perfect, complete metric space is uncountable. 

Proof Suppose not: Assume M is nonempty, perfect, complete, and countable. Since 
M consists of cluster points it must be denumerable and not finite. Say 

M — {xi, X2, . . •} 

is a list of all the elements of M . We will derive a contradiction by finding a point of 
M not in the list. Define 


M r p — {q G M : d(p,q) < r}. 

It is the closed neighborhood of radius r at p. Choose any y\ G M with y\ ^ x\ 
and choose r\ > 0 so that Y\ — M ri (yi) “excludes” x\ in the sense that x\ ^ Y\. We 
can take r\ as small as we want, say r\ < 1. 

Since M clusters at y\ we can choose y2 G M ri (yi) with y2 ^ X2 and choose 
r2 > 0 so that Y2 — M r 2 (y2) excludes X2- Taking r2 small ensures Y2 CY\. (Here we 
are using openness of M ri {y 1).) Also we take r2 < 1/2. Since Y2 C Ti, it excludes x\ 
as well as X2 . See Figure 45 . 

Nothing stops us from continuing inductively, and we get a nested sequence of 
closed neighborhoods Y\ D Y2 D T3 . . . such that Y n excludes x \, . . . , x n , and has 
radius r n < 1 /n. Thus the center points y n form a Cauchy sequence. Completeness 
of M implies that 

lim y n = y G M 

n —> 00 

exists. Since the sets Y n are closed and nested, y G for each n. Does y equal x\l 
No, for Y\ excludes x\. Does it equal X2? No, for I2 excludes X2. In fact, for each n 
we have y 7^ x n . The point y is nowhere in the supposedly complete list of elements 
of M, a contradiction. Hence M is uncountable. □ 


57 Corollary R and [a, b] are uncountable. 


Proof R is complete and perfect, while [a, b] is compact, therefore complete, and 
perfect. Neither is empty. □ 
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Figure 45 The exclusion of successively more points of the sequence (x n ) 

that supposedly lists all the elements of M 


58 Corollary Every nonempty perfect complete metric space is everywhere uncount- 
able in the sense that each r -neighborhood is uncountable. 

Proof The r /2-neighborhood M r / 2 (p ) is perfect: It clusters at each of its points. 
The closure of a perfect set is perfect. Thus, M r / 2 (p) is perfect. Being a closed 
subset of a complete metric space, it is complete. According to Theorem 56, M r / 2 {p) 
is uncountable. Since M r / 2 (p) C M r p, M r p is uncountable. □ 


Continuity of Arithmetic in R 


Addition is a mapping Sum :KxRoR that assigns to (x, y) the real number 
x-\-y. Subtraction and multiplication are also such mappings. Division is a mapping 
Rx(R\ {0}) — X M that assigns to (x, y) the number x/y. 


59 Theorem The arithmetic operations of R are continuous. 


60 Lemma For each real number c the function Mult c : R —X R that sends x to cx 
is continuous. 
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Proof If c — 0 the function is constantly equal to 0 and is therefore continuous. If 
c^O and e > 0 is given, choose 5 = e/ |c|. If \x — y\ <5 then 


Mult c (x) — Mult c (y) 


x — y\ < \c\ 5 


e 


which shows that Mult c is continuous. 


□ 


Proof of Theorem 59 We use the preservation of sequential convergence criterion 
for continuity. It’s simplest. Let ( x n ,y n ) — X (x,y) as n — X oo. 

By the triangle inequality we have 


Sum(:r n , y n ) - Sum(x,y)| < 


ry ry 

*AJ tAJ 


T Un y — 4um {{x n , y n ) 1 (x^ y)) . 


By Corollary 21 d s um is continuous, so d sum ((x n , y n ), (x, y)) — X 0 as n — X oo, which 
completes the proof that Sum is continuous. (By Theorem 17 it does not matter 
which metric we use on R x R.) 

Subtraction is the composition of continuous functions 


Sub (x,y) — Sum o (id x Mult_i)(x, y) 

and is therefore continuous. (Proposition 3 implies id is continuous, Lemma 60 implies 
Mult_i is continuous, and Corollary 18 implies id x Mult_i is continuous.) 

Multiplication is continuous since 
|Mult(x n , y n ) - Mult (a, y) 

< B(\x - x n \ + | y - y n |) 

= Mult J B(rfs U m((x n ,y n ), (x,y))) 0 


< x n -x\ \y n \ + \x\ I y n - y I 


as n — X oo, where we use the fact that convergent sequences are bounded to write 
\y n \ + \x\ < B for all n. 

Reciprocation is the function Rec : R \ { 0 } — x R \ {0} that sends x to \fx. If 
x n — x x 7 ^ 0 then there is a constant b > 0 such that for all large n we have |l/x n | < b 
and l/x\ < b. Since 


Rec(x n ) — Rec(x) 



1 1 


ry ry 

tAJ tAJ 


ry ry 

*aj yi tAJ 


ry ry 

tAJ tAJ 



< Mult& 2 (|x n — x\) — x 0 


as n — x oo we see that Rec is continuous. 


Division is continuous on R x (R \ {0}) since it is the composite of continuous 
mappings Mult o (id x Rec) : (x, y) i— X (x, 1/y) i— x x • 1/y . □ 
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The absolute value is a mapping Abs : R — > R that sends x to |x|. It is contin- 
uous since it is d(x, 0) and the distance function is continuous. The maximum and 
minimum are functions R x R — > R given by the formulas 


x + y 

max(x, y) — K 


x - y| 


min(x, y) — 


x + y \x — y\ 


so they are also continuous. 


61 Corollary The sums , differences , products, and quotients, absolute values, max- 
ima, and minima of real- valued continuous functions are continuous. (The denomi- 
nator functions should not equal zero.) 


Proof Take, for example, the sum f + g where /, g : M — > R are continuous. It is 
the composite of continuous functions 

M -^4 Ixl -^4 R 
x t-4 ( fx,gx ) i— > Sum(/x,yx), 

and is therefore continuous. The same applies to the other operations. □ 


62 Corollary Polynomials are continuous functions. 

Proof Propositions states that constant functions and the identity function are con- 
tinuous. Thus Corollary 61 and induction imply that the polynomial ao + a\x + • • • + 
a n x n is continuous. □ 


The same reasoning shows that polynomials of m variables are continuous func- 
tions R m — > R. 


Boundedness 

A subset S' of a metric space M is bounded if for some p G M and some r > 0, 

S C M r p. 

A set which is not bounded is unbounded. For example, the elliptical region 4x 2 + 
y 2 < 4 is a bounded subset of R 2 , while the hyperbola xy — 1 is unbounded. It is 
easy to see that if S is bounded then for each q G M there is an s such that M s q 
contains S. 

Distinguish the word “bounded” from the word “finite.” The first refers to phys- 
ical size, the second to the number of elements. The concepts are totally different. 
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Also, boundedness has little connection to the existence of a boundary - a clopen 
subset of a metric space has empty boundary, but some clopen sets are bounded, 
others not. 

Exercise 39 asks you to show that every convergent sequence is bounded, and to 
decide whether it is also true that every Cauchy sequence is bounded, even when the 
metric space is not complete. 

Boundedness is not a topological property. For example, (—1, 1) and R are home- 
omorphic although (—1, 1) is bounded and R is unbounded. The same example shows 
that completeness is not a topological property. 

A function from M to another metric space TV is a bounded function if its 
range is a bounded subset of N. That is, there exist q G N and r > 0 such that 

fM C N r q. 

Note that a function can be bounded even though its graph is not. For example, 
x sinx is a bounded function R — > R although its graph, {(x, y) G R 2 : y = sinx}, 
is an unbounded subset of R 2 . 


7 Coverings 

For the sake of simplicity we have postponed discussing compactness in terms of open 
coverings until this point. Typically, students find coverings a challenging concept. 
It is central, however, to much of analysis - for example, measure theory. 

Definition A collection It of subsets of M covers A C M if A is contained in the 
union of the sets belonging to U. The collection It is a covering of A. If U and V 
both cover A and if V C It in the sense that each set V G V belongs also to It then 
we say that It reduces to V, and that V is a subcovering of A. 

Definition If all the sets in a covering It of A are open then It is an open covering 
of A. If every open covering of A reduces to a finite subcovering of A then we say 
that A is covering compact^. 

The idea is that if A is covering compact and It is an open covering of A then 
just a finite number of the open sets are actually doing the work of covering A. The 
rest are redundant. 

^ You will frequently find it said that an open covering of A has a finite subcovering. “Has” means 


“reduces to.” 
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A covering 'll of A is also called a cover of A. The members of 'll are not called 
covers. Instead, you could call them scraps or patches. Imagine the covering as a 
patchwork quilt that covers a bed, the quilt being sewn together from overlapping 
scraps of cloth. See Figure 46. 



Figure 46 A covering of A by eight scraps. The set A is cross-hatched. 

The scraps are two discs, two rectangles, two ellipses, a pentagon, and a 
triangle. Each point of A belongs to at least one scrap. 

The mere existence of a finite open covering of A is trivial and utterly worthless. 
Every set A has such a covering, namely the single open set M . Rather, for A to 
be covering compact, each and every open covering of A must reduce to a finite 
subcovering of A. Deciding directly whether this is so is daunting. How could you 
hope to verify the finite reducibility of all open coverings of A? There are so many of 
them. For this reason we concentrated on sequential compactness; it is relatively easy 
to check by inspection whether every sequence in a set has a convergent subsequence. 

To check that a set is not covering compact it suffices to find an open covering 
which fails to reduce to a finite subcovering. Occasionally this is simple. For example, 
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the set (0, 1] is not covering compact in R because its covering 

It = {(1/n, 2) : n £ N} 
fails to reduce to a finite subcovering. 

63 Theorem For a subset A of a metric space M the following are equivalent: 

(a) A is covering compact. 

(b) A is sequentially compact. 

Proof that (a) implies (b) We assume A is covering compact and prove it is se- 
quentially compact. Suppose not. Then there is a sequence (p n ) in A, no subsequence 
of which converges in A. Each point a £ A therefore has some neighborhood M r a 
such that p n £ M r a only finitely often. (The radius r may depend on the point a.) 
The collection {M r a : a £ A} is an open covering of A and by covering compactness 
it reduces to a finite subcovering 


(M ri (ai), M r2 (a 2 ), . . M rk (a k )} 

of A. Since p n appears in each of these finitely many neighborhoods M n {af) only 
finitely often, it follows from the pigeonhole principle that (p n ) has only finitely many 
terms, a contradiction. Thus (p n ) cannot exist, and A is sequentially compact. □ 

The following presentation of the proof that (b) implies (a) appears in Royden’s 
book, Real Analysis. A Lebesgue number for a covering If of A is a positive real 
number A such that for each a G A there is some U £ U with M\a C U. Of course, 
the choice of this U depends on a. It is crucial, however, that the Lebesgue number 
A is independent of a £ A. 

The idea of a Lebesgue number is that we know each point a £ A is contained in 
some U £ If, and if A is extremely small then M\a is just a slightly swollen point - 
so the same should be true for it too. No matter where in A the neighborhood M\a 
is placed, it should lie wholly in some member of the covering. See Figure 47. 

If A is noncompact then it may have open coverings with no positive Lebesgue 
number. For example, let A = (0, 1) C R = M . The singleton collection {A} is 
an open covering of A, but there is no A > 0 such that for every a £ A we have 
(a — A, a + A) C A. See Exercise 86. 

64 Lebesgue Number Lemma Every open covering of a sequentially compact set 
has a Lebesgue number A > 0. 
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Figure 47 Small neighborhoods are like swollen points. It has a positive 
Lebesgue number A. The A-neighborhood of each point in the cross-hatched 
set A is wholly contained in at least one member of the covering. 

Proof Suppose not: It is an open covering of a sequentially compact set A, and yet 
for each A > 0 there exists ana G i such that no U G It contains M\a. Take A = 1 jn 
and let a n G A be a point such that no U G It contains Mi/ n (a n ). By sequential 
compactness, there is a subsequence (a nk ) converging to some point p G A. Since It 
is an open covering of A , there exist r > 0 and U G It with M r p C U. If k is large 
then d(a nfc ,_p) < r/2 and l/n^ < r/2, which implies by the triangle inequality that 

M 1/nk {a nk ) C M r p C U, 

contrary to the supposition that no U G It contains Mi/ n (a n ). We conclude that, 
after all, It does have a Lebesgue number A > 0. See Figure 48. □ 

Proof that (b) implies (a) in Theorem 63 Let It be an open covering of the 
sequentially compact set A. We want to reduce It to a finite subcovering. By the 
Lebesgue Number Lemma, It has a Lebesgue number A > 0. Choose any a\ G A and 
some U\ E It such that 


M x (ai) C Ui. 
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Figure 48 The neighborhood M r p engulfs the smaller neighborhood 

M-l/n k ( a n k )• 

If U\ D A then It reduces to the finite subcovering {U±} consisting of a single set, 
and the implication (b) => (a) is proved. On the other hand, as is more likely, if U\ 
does not contain A then we choose a point <22 G A \ U\ and U 2 G ft such that 

M\{a2) C U 2 . 

Either It reduces to the hnite subcovering {J7i, U 2 } (and the proof is finished) or 
else we can continue, eventually producing a sequence (a n ) in A and a sequence (U n ) 
in It such that 


M\(a n ) C U n and a n+ i G (A \ (U\ U • • • U U n )). 

We will show that such sequences (a n ), (U n ) lead to a contradiction. By sequential 
compactness, there is a subsequence (a nk ) that converges to some p G A. For a large 
k we have d(a nk ,p ) < A and 


P C M\(^ci nk ) C U nk . 


See Figure 49. 

All a n£ with t > k he outside U nk , which contradicts their convergence to p. Thus, 
at some hnite stage the process of choosing points a n and sets U n terminates, and If 
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Figure 49 The point a Uk is so near p that the neighborhood M\(a nk ) 

engulfs p. 

reduces to a finite subcovering {C/i, . . . , U n } of A , which implies that A is covering 
compact. See also the remark on page 421. □ 

Upshot In light of Theorem 63, the term “compact” may now be applied equally to 
any set obeying (a) or (b). 

Total Boundedness 

The Heine-Borel Theorem states that a subset of R m is compact if and only if 
it is closed and bounded. In more general metric spaces, such as Q, the assertion is 
false. But what if the metric space is complete? As remarked on page 81 it is still 
false. 

But mathematicians do not quit easily. The Heine-Borel Theorem ought to gen- 
eralize beyond W 71 somehow. Here is the concept we need: A set A C M is totally 
bounded if for each e > 0 there exists a finite covering of A by e-neighborhoods. No 
mention is made of a covering reducing to a subcovering. How close total boundedness 
is to the worthless fact that every metric space has a finite open covering! 

65 Generalized Heine-Borel Theorem A subset of a complete metric space is 
compact if and only if it is closed and totally bounded. 
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Proof Let A be a compact subset of M. Therefore it is closed. To see that it is 
totally bounded, let e > 0 be given and consider the covering of A by e-neighborhoods, 

{M e x : x G A}. 

Compactness of A implies that this covering reduces to a finite subcovering and 
therefore A is totally bounded. 

Conversely, assume that A is a closed and totally bounded subset of the complete 
metric space M. We claim that A is sequentially compact. That is, every sequence 
(a n ) in A has a subsequence that converges in A. Set = 1/fc, k — 1, 2 , ... . Since 
A is totally bounded we can cover it by finitely many ei-neighborhoods 

Af ei ((/i), • ••, M ei ( K Q m ). 

By the pigeonhole principle, terms of the sequence a n lie in at least one of these 
neighborhoods infinitely often, say it is M ei (pi). Choose 

a ni € Ai = An M ei (pi). 

Every subset of a totally bounded set is totally bounded, so we can cover A\ by finitely 
many 62-neighborhoods. For one of them, say M e2 (p2) 5 lies in A2 = A\ n M t2 (p 2 ) 
infinitely often. Choose a n2 G A2 with 712 > n\. 

Proceeding inductively, cover A&_ 1 by finitely many e^-neighborhoods, one of 
which, say M efc (p/ C ), contains terms of the sequence (a n ) infinitely often. Then choose 
a Uk G Ah — Afc_i Cl M ek (pk) with n & > rik-i- Then (a nk ) is a subsequence of (a n ). It 
is Cauchy. For if e > 0 is given we choose N such that 2/N < e. If k,£ > N then 

2 

Q>n k ,a>ni c A n and diamA^ < 2e N = — < 6, 

which shows that (a nk ) is Cauchy. Completeness of M implies that (a nk ) converges 
to some p G M and since A is closed we have p G A. Hence A is compact. □ 

66 Corollary A metric space is compact if and only if it is complete and totally 
bounded. 

Proof Every compact metric space M is complete. This is because, given a Cauchy 
sequence (p n ) in M, compactness implies that some subsequence converges in M, 
and if a Cauchy sequence has a convergent subsequence then the mother sequence 
converges too. As observed above, compactness immediately gives total boundedness. 

Conversely, assume that M is complete and totally bounded. Every metric space 
is closed in itself. By Theorem 65, M is compact. □ 
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8 Cantor Sets 

Cantor sets are fascinating examples of compact sets that are maximally disconnected. 
(To emphasize the disconnectedness, one sometimes refers to a Cantor set as “Cantor 
dust.”) Here is how to construct the standard Cantor set. Start with the unit 
interval [0,1] and remove its open middle third, (1/3, 2/3). Then remove the open 
middle third from the remaining two intervals, and so on. This gives a nested sequence 
C° D C 1 D C 2 D . . . where C° — [0, 1], C 1 is the union of the two intervals [0, 1/3] 
and [2/3, 1], C 2 is the union of four intervals [0, 1/9], [2/9, 1/3], [2/3, 7/9], and [8/9, 1], 
C 3 is the union of eight intervals, and so on. See Figure 50. 

c*> — — — — “ — 

c 1 

c 2 

C 5 — * An endpoint 


C - - - ” 


Figure 50 The construction of the standard middle-thirds Cantor set C 

In general C n is the union of 2 n closed intervals, each of length l/3 n . Each C n is 
compact. The standard middle thirds Cantor set is the nested intersection 

(X) 

c = n c n . 

71=0 

We refer to C as “the” Cantor set. Clearly it contains the endpoints of each of 
the intervals comprising C n . Actually, it contains uncountably many more points 
than these endpoints! There are other Cantor sets defined by removing, say, middle 
fourths, pairs of middle tenths, etc. All Cantor sets turn out to be homeomorphic to 
the standard Cantor set. See Section 9. 

A metric space M is totally disconnected if each point p G M has arbitrarily 
small clopen neighborhoods. That is, given e > 0 and p G M, there exists a clopen 
set U such that 


p G U C M e p. 
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For example, every discrete space is totally disconnected. So is Q. 

67 Theorem The Cantor set is a compact , nonempty, perfect, and totally discon- 
nected metric space. 


Proof The metric on C is the one it inherits from R, the usual distance \x — y\. Let 
E be the set of endpoints of all the C n -intervals, 


E = { 0, 1, 1/3, 2/3, 1/9, 2/9, 7/9, 8/9, . . 


Clearly E is denumerable and contained in C, so C is nonempty and infinite. It is 
compact because it is the intersection of compacts. 

To show C is perfect and totally disconnected, take any x G C and any e > 0. 
Fix n so large that l/3 n < e. The point x lies in one of the 2 n intervals I of length 
l/3 n that comprise C n . Fix this I. The set E n I is infinite and contained in the 
interval [x — e, x + e). Thus C clusters at x and C is perfect. See Figure 51. 

i 1 ] ) 

X - £ X X + £ 


Figure 51 The endpoints of C cluster at x. 


The interval I is closed in R and therefore in C n . The complement J — C n \ I 
consists of finitely many closed intervals and is therefore closed too. Thus, I and J are 
clopen in C n . By the Inheritance Principle their intersections with C are clopen in C, 
so C Fl I is a clopen neighborhood of x in C which is contained in the e-neighborhood 
of x, completing the proof that C is totally disconnected. □ 

68 Corollary The Cantor set is uncountable. 

Proof Being compact, C is complete, and by Theorem 56, every complete, perfect, 
nonempty metric space is uncountable. □ 

A more direct way to see that the Cantor set is uncountable involves a geometric 
coding scheme. Take the code 0 = left and 2 = right. Then 

Co = left interval = [0,1/3] C 2 = right interval = [2/3,1], 

and C 1 = Co U C 2 . Similarly, the left and right subintervals of Co are coded Coo 
and C 02 , while the left and right subintervals of C 2 are C 20 and C 22 - This gives 

C 2 = 


Cq 0 U C 02 L C 20 L C 22 . 
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The intervals that comprise C 3 are specified by strings of length 3. For instance, C 220 
is the left subinterval of C^- In general an interval of C n is coded by an address 
string of n symbols, each a 0 or a 2. Read it like a zip code. The first symbol gives 
the intervahs gross location (left or right), the second symbol refines the location, the 
third refines it more, and so on. 

Imagine now an infinite address string uj — uj\uj2UJz ... of zeros and twos. 
Corresponding to cc, we form a nested sequence of intervals 



5 


the intersection of which is a point p 


p{uj) G C. Specifically, 


P(^) PI ^ijj\n 


where uj\n — uj\ . . . uj n truncates uj to an address of length n. See Theorem 34. 


As we have observed, each infinite address string defines a point in the Cantor set. 


n 


Conversely, each point p G C has an address uj — uj(p): its first n symbols a — uj 
are specified by the interval C a of C n in which p lies. A second point q has a different 
address, since there is some n for which p and q lie in distinct intervals C a and Cp 
of C n . 


In sum, the Cantor set is in one-to-one correspondence with the set ft of addresses. 
Each address uj G ft defines a point p(uj) G C and each point p G C has a unique 
address uj(p). The set ft is uncountable. In fact it corresponds bijectively to R. See 
Exercise 112. 

If S C M and S — M then S is dense in M . For example, Q is dense in R. The 
set S is somewhere dense if there exists an open nonempty set U C M such that 
S HU D U. If S' is not somewhere dense then it is nowhere dense. 


69 Theorem The Cantor set contains no interval and is nowhere dense in R. 


Proof Suppose not and C contains (a, b). Then (a, b) C C n for all n G N. Take n 
with 1/3 n <b — a. Since (a, b ) is connected it lies wholly in a single C n -interval, say 
I. But I has smaller length than (a, 6), which is absurd, so C contains no interval. 

Next, suppose C is dense in some nonempty open set U C R, i.e., the closure of 
C HU contains U. Thus 


C = C D Cnu d U d (a, 6), 


contrary to the fact that C contains no interval. 


□ 
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The existence of an uncountable nowhere dense set is astonishing. Even more is 
true: The Cantor set is a zero set - it has “outer measure zero.” By this we mean 
that, given any e > o, there is a countable covering of C by open intervals (a&, &&), 
and the total length of the covering is 

(X) 

^ ^ bk CLk ^ 
k = 1 

(Outer measure is one of the central concepts of Lebesgue Theory. See Chapter 6.) 
After all, C is a subset of C n , which consists of 2 n closed intervals, each of length 
l/3 n . If n is large enough then 2 n /3 n < e. Enlarging each of these closed intervals to 
an open interval keeps the sum of the lengths < e, and it follows that C is a zero set. 

If we discard subintervals of [0, 1] in a different way, we can make a fat Cantor 
set - one that has positive outer measure. Instead of discarding the middle-thirds of 
intervals at the n th stage in the construction, we discard only the middle 1 jn\ portion. 
The discards are grossly smaller than the remaining intervals. See Figure 52. The 
total amount discarded from [0, 1] is < i, and the total amount remaining, the outer 
measure of the fat Cantor set, is positive. See Exercise 3.31. 


Figure 52 In forming a fat Cantor set, the gap intervals occupy a 
progressively smaller proportion of the Cantor set intervals. 


9* Cantor Set Lore 

In this section, we explore some arcane features of Cantor sets. 

Although the continuous image of a connected set is connected, the continuous 
image of a disconnected set may well be connected. Just crush the disconnected set 
to a single point. Nevertheless, I hope you find the following result striking, for it 
means that the Cantor set C is the universal compact metric space, of which all 
others are merely shadows. 

70 Cantor Surjection Theorem Given a compact nonempty metric space M , there 
is a continuous surjection of C onto M. 

See Figure 53. Exercise 114 suggests a direct construction of a continuous sur- 
jection C — > [0,1], which is already an interesting fact. The proof of Theorem 70 
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Figure 53 a surjects C onto M. 

involves a careful use of the address notation from Section 8 and the following simple 
lemma about dividing a compact metric space M into small pieces. A piece of M is 
any compact nonempty subset of M. 

71 Lemma If M is a nonempty compact metric space and e > 0 is given then M 
can be expressed as the finite union of pieces, each of diameter < e. 

Proof Reduce the covering {M e / 2 (x) : x G M} of M to a finite subcovering and take 
the closure of each member of the subcovering. □ 

We say that M divides into these small pieces. The metaphor is imperfect 
because the pieces may overlap. The strategy of the proof of Theorem 70 is to divide 
M into large pieces, divide the large pieces into small pieces, divide the small pieces 
into smaller pieces and continue indefinitely. Labeling the pieces coherently with 
words in two letters leads to the Cantor surjection. 

Let W (n) be the set of words in two letters, say a and 6, having length n. Then 
ffW{n) — 2 n . For example W( 2) consists of the four words aa, bb , ab , and ba. 

Using Lemma 71 we divide M into a finite number of pieces of diameter < 1 and 
we denote by Mi the collection of these pieces. We choose n\ with 2 ni > #Mi and 
choose any surjection w\ : W(n{) — > Mi. Since there are enough words in VF(ni), w\ 
exists. We say w\ labels Mi and if w\(a) — L then a is a label of L. 
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Then we divide each L G Mi into finitely many smaller pieces. Let M 2 (L) be the 
collection of these smaller pieces and let 

m 2 = U m 2 (l). 

Le Mi 

Choose n 2 such that 2 n2 > max{^M 2 (L) : L G Mi} and label M 2 with words 
a/3 G W{n\ + n 2 ) such that 

If L — w i(a) then a/3 labels the pieces S G M 2 (L) 
as /3 varies in VF(n 2 ). 

This labeling amounts to a surjection rc 2 : W(n\ + n 2 ) — > M 2 that is coherent with 
w\ in the sense that (3 1 — > ic 2 (o/ 3) labels the pieces S G ici(o). Since there are enough 
words in IT(n 2 ), ic 2 exists. If there are other labels a r of L G Mi then we get other 
labels a' (3' for the pieces S G M 2 (L). We make no effort to correlate them. 

Proceeding by induction we get finer and finer divisions of M coherently labeled 
with longer and longer words. More precisely there is a sequence of divisions (M&) 
and surjections w & : Wk — W(n\ + • • • + n^) — ^ M^ such that 

(a) The maximum diameter of the pieces L G M& tends to zero as k — 00. 

(b) Mfc+i refines M^ in the sense that each S G M^+i is contained in some L G M&. 
(“The small pieces S are contained in the large pieces L.”) 

(c) If L G M^ and M&+i (L) denotes {S G M/e+i : S C L} then 

L = U 5. 

S'GMfc+dL) 

(d) The labelings w ^ are coherent in the sense that if Wk(a) — L G M/e then 
/ 3 i-g ic/e+i(<a/3) labels M/e+i(L) as /3 varies in VL(n^ + i). 

See Figure 54. 

Proof of the Cantor Surjection Theorem We are given a nonempty compact 
metric space M and we seek a continuous surjection a : C ^ M where C is the 
standard Cantor set. 

C — n C n where C n is the disjoint union of 2 n closed intervals of length l/3 n . 
In Section 8 we labeled these C n -intervals with words in the letters 0 and 2 having 
length n. (For instance C 22 o is the left C 3 -interval of C 22 = [8/9, 1], namely C 22 o = 
[8/9,25/27].) We showed there is a natural bijection between C and the set of all 
infinite words in the letters 0 and 2 defined by 

P PI 
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aa 

bb 

ab 

ba 


Mi 


Mi 

aaa 

bba 

aab 


bbb 

aba 

baa 

abb 

bab 


Figure 54 Coherently labeled successive divisions of M . They have 
ri\ — 2, ri 2 = 1, and n 3 = 6. Note that overlabeling is necessary. 


We referred to uj — u(p) as the address of p. ( uj\n is the truncation of oj to its first n 
letters.) See page 107. 

For k — 1,2,... let M& be the fine divisions of M constructed above, coherently 
labeled by They obey (a)-(d). Given p G C we look at the nested sequence of 
pieces L^(p) G M& such that L^(p) = Wk(oj\(ni + • • • + n&)) where lj — (v(p). That 
is, we truncate cu(p) to its hrst n\ + • • • + letters and look at the piece in M/e with 
this label. (We replace the letters 0 and 2 with a and b .) Then (L^(p)) is a nested 
decreasing sequence of nonempty compact sets whose diameters tend to 0 as k — > 00. 
Thus n Lkip) is a we ll defined point in M and we set 

&(p) = n L k(p)- 

ken 


We must show that cr is a continuous surjection C — > M. Continuity is simple. If 
E C are close together then for large n the first n entries of their addresses are 
equal. This implies that a{p) and cr(p') belong to a common L & and k is large. Since 
the diameter of L & tends to 0 as k — > 00 we get continuity. 

Surjectivity is also simple. Each q G M is the intersection of at least one nested 
sequence of pieces L & G M&. For q belongs to some piece L\ G Mi, and it also belongs 
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to some subpiece L 2 G M 2 (la), etc. Coherence of the labeling of the implies that 
for each nested sequence (L&) there is an infinite word a — oqo ^ 3 • • • such that 
Oii G W(rii) and L & = ^(oq . . . a m ) with m = ri\ + • • • + n^. The point p G C with 
address a is sent by a to q. □ 


Peano Curves 

72 Theorem There exists a Peano curve , a continuous path in the plane which is 
space-filling in the sense that its image has nonempty interior. In fact there is a 
Peano curve whose image is the closed unit disc B 2 . 


Proof Let a : C B 2 be a continuous surjection supplied by Theorem 70. Extend 
a to a map r : [0, 1] — > B 2 by setting 


r 



/ 


< 


cr(x) 

(1 — t)cr(a) + ta(b) 




if x G C 

if x = (1 — t)a + tb G (a, b) 
and (a, b) is a gap interval. 


A gap interval is an interval (a, b) C C c such that a, b G C. Because a is continuous, 
a(a) — a(b) \ -G 0 as \a — b\ — > 0. Hence r is continuous. Its image includes the disc 
B 2 and thus has nonempty interior. In fact the image of r is exactly £> 2 , since the 
disc is convex and r just extends a via linear interpolation. See Figure 55. □ 


This Peano curve cannot be one-to-one since C is not homeomorphic to B 2 . (C 
is disconnected while B 2 is connected.) In fact no Peano curve r can be one-to-one. 
See Exercise 102. 


Cantor Spaces 

We say that M is a Cantor space if, like the standard Cantor set C, it is compact, 
nonempty, perfect, and totally disconnected. 

73 Moore-Kline Theorem Every Cantor space M is homeomorphic to the stan- 
dard middle-thirds Cantor set C. 


A Cantor piece is a nonempty clopen subset S' of a Cantor space M. It is easy 
to see that S is also a Cantor space. See Exercise 100. Since a Cantor space is totally 
disconnected, each point has a small clopen neighborhood N. Thus, a Cantor space 
can always be divided into two disjoint Cantor pieces, M — U U U c . 
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gap interval 

- C 



Figure 55 Filling in the Cantor surjection a to make a Peano space-filling 

curve r 


74 Cantor Partition Lemma Given a Cantor space M and e > 0 ; there is a num- 
ber N such that for each d > N there is a partition of M into d Cantor pieces of 
diameter < e. (We care most about dyadic d.) 


Proof A partition of a set is a division of it into disjoint subsets. In this case 
the small Cantor pieces form a partition of the Cantor space M. Since M is totally 
disconnected and compact, we can cover it with finitely many clopen neighborhoods 
Ci, , U m having diameter < e. To make the sets U\ disjoint, define 


Vi = Ci 

F 2 = Cl \ C 2 


Vrn ~ C m \ (Cl U • • • U C m _i). 

If any Vi is empty, discard it. This gives a partition M — X\ U • • • U Xjy into N < m 
Cantor pieces of diameter < e. 
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If d = IV this finishes the proof. If d > N then we inductively divide Xjy into 
two, and then three, and eventually d — N + 1 disjoint Cantor pieces; say 

Xn = Yi U • • • U Yd-N+ 1- 

The partition M — X\ U • • • U Ajv- l U Y\ U • • • U Y^-n+i hnishes the proof. □ 

Proof of the Moore-Kline Theorem We are given a Cantor space M and we 
seek a homeomorphism from the standard Cantor set C onto M . 

By Lemma 74 there is a partition Mi of M into d\ nonempty Cantor pieces where 
d\ — 2 ni is dyadic and the pieces have diameter < 1. Thus there is a bijection 
w\ : W\ — > Mi where W\ — W (ni). 

According to the same lemma, each L G Mi can be partitioned into N(L) Cantor 
pieces of diameter <1/2. Choose a dyadic number 

d 2 = 2 n2 > max{7V(L) : L G Mi} 

and use the lemma again to partition each L into d 2 smaller Cantor pieces. These 
pieces constitute M2 (L), and we set M2 = \J l 'M 2 (L). It is a partition of M having 
cardinality d\d 2 and in the natural way described in the proof of Theorem 70 it is 
coherently labeled by W 2 — W{n\ + n 2 ). Specifically, for each L G Mi there is a 
bijection wl : W{n 2 ) — > M2 (L) and we dehne w 2 : W 2 -G M2 by w 2 (a/3) = S G M2 if 
and only if w\(a) — L and wl(/3) = S. This w 2 is a bijection. 

Proceeding in exactly the same way, we pass from 2 to 3, from 3 to 4, and 
eventually from k to k + 1, successively refining the partitions and extending the 
bijective labelings. 

The Cantor surjection constructed in the proof of Theorem 70 is 

<r(p) = n L k (p) 

k 

where L^(p) G M& has label u(p)\m with m — n\ + • • • + n^. Distinct points p,p' G C 
have distinct addresses cj, uo r . Because the labelings w ^ are bijections and the divisions 
M/e are partitions, lj ^ uJ implies that for some fc, Lj^(p) ^ L^p'), and thus a(p) 7^ 
cr(p'). That is, cr is a continuous bijection C — > M . A continuous bijection from one 
compact to another is a homeomorphism. □ 

75 Corollary Every two Cantor spaces are homeomorphic. 

Proof Immediate from the Moore-Kline Theorem: Each is homeomorphic to C. □ 
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76 Corollary The fat Cantor set is homeomorphic to the standard Cantor set. 

Proof Immediate from the Moore-Kline Theorem. □ 

77 Corollary A Cantor set is homeomorphic to its own Cartesian square ; that is, 
C^C xC. 

Proof It is enough to check that C x C is a Cantor space. It is. See Exercise 99. □ 

The fact that a nontrivial space is homeomorphic to its own Cartesian square is 
disturbing, is it not? 

Ambient Topological Equivalence 

Although all Cantor spaces are homeomorphic to each other when considered as 
abstract metric spaces, they can present themselves in very different ways as subsets 
of Euclidean space. Two sets A , B in R m are ambiently homeomorphic if there is 
a homeomorphism of R m to itself that sends A onto B. For example, the sets 

A = {0} U [1,2] U {3} and B = {0} U {1} U [2,3] 

are homeomorphic when considered as metric spaces, but there is no ambient homeo- 
morphism of R that carries A to B. Similarly, the trefoil knot in R 3 is homeomorphic 
but not ambiently homeomorphic in R 3 to a planar circle. See also Exercise 105. 

78 Theorem Every two Cantor spaces in R are ambiently homeomorphic. 

Let M be a Cantor space contained in R. According to Theorem 73, M is home- 
omorphic to the standard Cantor set C. We want to find a homeomorphism of R to 
itself that carries C to M . 

The convex hull of S C R m is the smallest convex set H that contains S. When 
m — 1, H is the smallest interval that contains S. 

79 Lemma A Cantor space M C R can be divided into two Cantor pieces whose 
convex hulls are disjoint. 

Proof Obvious from one-dimensionality of R: Choose a point x E R \ M such that 
some points of M he to the left of x and others he to its right. Then 

M — M n (-oo,x) U (x, oo) n M 

divides M into disjoint Cantor pieces whose convex hulls are disjoint closed intervals. □ 
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Proof of Theorem 78 Let M C 1 be a Cantor space. We will find a homeomor- 
phism r : R -A R sending C to M. Lemma 79 leads to Cantor divisions M& such that 
the convex hulls of the pieces in each M& are disjoint. With respect to the left/right 
order of R, label these pieces in the same way that the Cantor middle third intervals 
are labeled: Lo and L 2 in Mi are the left and right pieces of M, Loo and L 02 are 
the left and right pieces of Lo, and so on. Then the homeomorphism a : C — ^ M 
constructed in Theorems 70 and 73 is automatically monotone increasing. Extend a 
across the gap intervals affinely as was done in the proof of Theorem 72, and extend 
it to R\ [0, 1] in any affine increasing fashion such that r(0) = cr(0) and r( 1) = cr(l). 
Then r : R — > R extends a to R. The monotonicity of a implies that r is one-to-one, 
while the continuity of a implies that r is continuous, r : R — > R is a homeomorphism 
that carries C onto M. 

If M' C R is a second Cantor space and r' : R -A R is a homeomorphism that 
sends C onto M' then r' o r _1 is a homeomorphism of R that sends M onto M ' . □ 

As an example, one may construct a Cantor set in R by removing from [0, 1] its 
middle third, then removing from each of the remaining intervals nine symmetrically 
placed subintervals; then removing from each of the remaining twenty intervals, four 
asymmetrically placed subintervals; and so forth. In the limit (if the lengths of the 
remaining intervals tend to zero) we get a nonstandard Cantor set M. According to 
Theorem 78, there is a homeomorphism of R to itself sending the standard Cantor 
set C onto M. 

Another example is the fat Cantor set mentioned on page 108. It too is ambiently 
homeomorphic to C. 

Theorem Every two Cantor spaces in R 2 are ambiently homeomorphic. 

We do not prove this theorem here. The key step is to show M has a dyadic disc 
partition. That is, M can be divided into a dyadic number of Cantor pieces, each 
piece contained in the interior of a small topological disc L^, the D{ being mutually 
disjoint. (A topological disc is any homeomorph of the closed unit disc L> 2 . Smallness 
refers to diam LC) The proofs I know of the existence of such dyadic partitions are 
tricky cut-and-paste arguments and are beyond the scope of this book. See Moise’s 
book, Geometric Topology in Dimensions 2 and 3 and also Exercise 138. 

Antoine’s Necklace 

A Cantor space M C R m is tame if there is an ambient homeomorphism h : 
R m — > R m that carries the standard Cantor set C (imagined to he on the xi-axis 
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in R m ) onto M. If M is not tame it is wild. Cantor spaces contained in the line 
or plane are tame. In 3-space, however, there are wild ones, Cantor sets A so badly 
embedded in R 3 that they act like curves. It is the lack of a “ball dyadic partition 
lemma” that causes the problem. 

The first wild Cantor set was discovered by Louis Antoine, and is known as 
Antoine’s Necklace. The construction involves the solid toms or anchor ring, 
which is homeomorphic to the Cartesian product B 2 x S 1 . It is easy to imagine a 
necklace of solid tori: Take an ordinary steel chain and modify it so its first and last 
links are also linked. See Figure 56. 



Figure 56 A necklace of twenty solid tori 

Antoine’s construction then goes like this. Draw a solid toms A 0 . Interior to A°, 
draw a necklace A 1 of several small solid tori, and make the necklace encircle the 
hole of A 0 . Repeat the construction on each solid toms T comprising A 1 . That is, 
interior to each T, draw a necklace of very small solid tori so that it encircles the hole 
of T. The result is a set A 2 C A 1 which is a necklace of necklaces. In Figure 56, A 2 
would consist of 400 solid tori. Continue indefinitely, producing a nested decreasing 

sequence A 0 D A 1 D A 2 D The set A n is compact and consists of a large 

number (20 n ) of extremely small solid tori arranged in a hierarchy of necklaces. It 
is an n th order necklace. The intersection A = Q A n is a Cantor space, since it is 
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compact, perfect, nonempty, and totally disconnected. It is homeomorphic to C . See 
Exercise 139. 

Certainly A is bizarre, but is it wild? Is there no ambient homeomorphism h of 
R 3 that sends the standard Cantor set C onto A? The reason that h cannot exist is 
explained next. 



Figure 57 k loops through A 0 , which contains the necklace of solid tori. 


Referring to Figure 57, the loop k passing through the hole of A 0 cannot be 
continuously shrunk to a point in R 3 without hitting A. For if such a motion of n 
avoids A then, by compactness, it also avoids one of the high-order necklaces A n . In 
R 3 it is impossible to continuously de-link two linked loops, and it is also impossible 
to continuously de-link a loop from a necklace of loops. (These facts are intuitively 
believable but hard to prove. See Dale Rolfsen’s book, Knots and Links.) 

On the other hand, each loop A in R 3 \ C can be continuously shrunk to a point 
without hitting C . For there is no obstruction to pushing A through the gap intervals 
of C. 


Now suppose that there is an ambient homeomorphism h of R 3 that sends C to 
A. Then A = is a loop in R 3 \ C, and it can be shrunk to a point in R 3 \ C, 

avoiding C. Applying h to this motion of A continuously shrinks k to a point, avoiding 
A, which we have indicated is impossible. Hence h cannot exist, and A is wild. 
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10* Completion 

Many metric spaces are complete (for example, every closed subset of Euclidean space 
is complete), and completeness is a reasonable property to require of a metric space, 
especially in light of the following theorem. 

80 Completion Theorem Every metric space can be completed. 


This means that just as R completes Q, we can take any metric space M and find 
a complete metric space M containing M whose metric extends the metric of M . To 
put it another way, M is always a metric subspace of a complete metric space. In a 
natural sense the completion is uniquely determined by M. 

81 Lemma Given four points p, q,x,y G M , we have 

\d(p, q) — d(x, y)\ < d(p, x) + d(q, y). 

Proof The triangle inequality implies that 

d(x,y ) < d(x,p ) + d(p,q ) + d(q,y ) 
d(p,q ) < d(p,x ) + d(x,y ) + d(y,q), 

and hence 

— (d(p, x) + d(q,y )) < d(p,q) - d(x,y ) < (d(p,x) + d(q,y)). 

A number sandwiched between —k and k has magnitude < fc, which completes the 
proof. □ 


Proof of the Completion Theorem 80 We consider the collection S of ah Cauchy 
sequences in M, convergent or not, and convert it into the completion of M . (This is a 
bold idea, is it not?) Cauchy sequences (p n ) and (q n ), are co-Cauchy if d(p n , q n ) — > 0 
asn^ oo. Co- Cauchy ness is an equivalence relation on S. (This is easy to check.) 

Define M to be S modulo the equivalence relation of being co-Cauchy. Points of 
M are equivalence classes P — [(p n )] such that (p n ) is a Cauchy sequence in M . The 
metric on M is 

D(P, Q) = lim d(p n ,q n ), 

n— oo 


where P — [(p n )] an d Q 


[(q n )]- If only remains to verify three things: 


(a) D is a well defined metric on M. 

(b) M cM. 

(c) M is complete. 
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None of these assertions is really hard to prove, although the details are somewhat 
messy because of possible equivalence class/representative ambiguity. 

(a) By Lemma 81 

| d(jpmi Qm) d(p n , q n )\ — ^(Pm)Pn) ^n)- 

Thus (d(p n , q n )) is a Cauchy sequence in R, and because R is complete, 


L = lim d(p n ,q n ) 


n— ^oo 


exists. Let (p' n ) and (q' n ) be sequences that are co-Cauchy with (p n ) and (g n ), and let 


L’ = lim d(p' n , q n ) 


n— ^oo 


Then 


L-L ' | < \L - d(p n ,q n )\ + \d(p n ,q n ) - d(p' n ,q' n )\ + \d(p' n ,q' n ) - l! 


As n ^ oo, the hrst and third terms tend to 0. By Lemma 81, the middle term is 


I d(p n ,q n ) ~ d(p' n ,q ' n ) 


< d(p n ,p' n ) + d(q n ,q' n ), 


which also tends to 0 as n — ^ oo. Hence L — L' and D is well defined on M . The 
d-distance on M is symmetric and satisfies the triangle inequality. Taking limits, 
these properties carry over to D on M, while positive definiteness follows directly 
from the co-Cauchy definition. 

(b) Think of each p G M as a constant sequence, p — (p,p,p,p, . . .). Clearly it 
is Cauchy and clearly the ^-distance between two constant sequences p and q is the 
same as the d-distance between the points p and q. In this way M is naturally a 
metric subspace of M. 

(c) Let be a Cauchy sequence in M. We must find Q G M to which 

Pk converges as k — > oo. (Note that (P&) is a sequence of equivalence classes, not 
a sequence of points in M, and convergence refers to D not d.) Because D is well 
defined we can use a trick to shorten the proof. Observe that every subsequence of 
a Cauchy sequence is Cauchy, and it and the mother sequence are co-Cauchy. For 
all the terms far along in the subsequence are also far along in the mother sequence. 
This lets us take a representative of Pk all of whose terms are at distance < 1/k from 
each other. Call this sequence (pk,n)neN- We have [(pk,n)} — Pk- 

Set q n — Pn,n- We claim that (q n ) is Cauchy and D(Pk, Q) — > 0 as k — >> oo, where 
Q — [(qn)]- That is, M is complete. 
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Let e > 0 be given. There exists IV > 3/e such that if k,£P N then 

D(P k ,Pe) < | 

and 


d(qk,qi) = dipkfaPij) 

< d(p k:k ,p k ^ n ) + d{p k ^Pi, n ) + d(pp n ,ppi) 

1 1 

— T d(.P/c,n5 Pi,n) T ^ 

2e 

— b d\p k ,m PZ,n) • 

The inequality is valid for all n and the left-hand side, d(q k , qe), does not depend on 
n. The limit of d(p k ,mP£,n) as n -A oo is D(P k ,Pi), which we know to be < e/3. 
Thus, N then d(q k , qi) e and (c[ri) is Cauchy. Similarly we see that P k — y Q 

as k -A oo. For, given e > 0, we choose IV > 2/e such that if fc,n > then 
d(q k , q n ) < e/2, from which it follows that 

d(pk,n,q n ) < d(p k:n ,p k)k ) + d(p k)k ,q n ) 

= d(p kjn ,p k}k ) + d(q k ,q n ) 

1 e 

- fc + 2 <e ‘ 

The limit of the left-hand side of this inequality, as n — > oo, is D(P k , Q). Thus 

lim Pi, = Q 

k — ^oo 

and M is complete. □ 


Uniqueness of the completion is not surprising, and is left as Exercise 106. A 
different proof of the Completion Theorem is sketched in Exercise 4.39. 

A Second Construction of R. from Q 

In the particular case that the metric space M is Q, the Completion Theorem leads 
to a construction of R from Q via Cauchy sequences. Note, however, that applying 
the theorem as it stands involves circular reasoning, for its proof uses completeness 
of R to define the metric D. Instead, we use only the Cauchy sequence strategy. 

Convergence and Cauchyness for sequences of rational numbers are concepts that 
make perfect sense without a priori knowledge of R. Just take all epsilons and deltas 
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in the definitions to be rational. The Cauchy completion Q of Q is the collection 
6 of Cauchy sequences in Q modulo the equivalence relation of being co-Cauchy. 

We claim that Q is a complete ordered held. That is, Q is just another version of 
R. The arithmetic on Q is defined by 


P + Q — [(. Pn + Qn)] 

PQ — [( PnQn )] 


P Q — [(Pn Qn )] 

P/Q = [(Pn/q n )\ 


where P — [(p n )] and Q — [(qn)]- Of course Q 7 ^ [( 0 , 0 ,...)] in the fraction P/Q. 
Exercise 134 asks yon to check that these natural definitions make Q a held. Although 
there are many things to check - well dehnedness, commutativity, and so forth - all 
are effortless. There are no sixteen case proofs as with cuts. Also, just as with metric 
spaces, Q is naturally a subheld of Q when we think of r G Q as the constant sequence 
r — [(r, r, ...)]. 


That’s the easy part - now the rest. 

To dehne the order relation on Q we rework some of the cut ideas. If P G Q has 
a representative [(p n )], such that for some e > 0, we have p n > e for all n then P is 
positive. If — P is positive then P is negative. 


Then we dehne P -< Q if Q — P is positive. Exercise 135 asks you to check that 
this dehnes an order on Q, consistent with the standard order < on Q in the sense 
that for all p, q G Q we have p < q <=> p -<q. In particular, you are asked to prove 
the trichotomy property: Each P G Q is either positive, negative, or zero, and these 
possibilities are mutually exclusive. 


Combining Cauchyness with the dehnition of gives 



P = [(Pn)} -< Q 



<f 


there exist e > 0 and IV G N 
such that for all m, n > N, 
we have p m + e < q n . 


It remains to check the least upper bound property. Let T be a nonempty subset 
of Q that is bounded above. We must hnd a least upper bound for T. 

First of all, since CP is bounded there is a B — ( b n ) G Q such that P -< B for 
all P G CP. We can choose B so its terms he at distance < 1 from each other. Set 
b = b\ + 1 . Then b is an upper bound for T. Since Q is Archimedean there is an 
integer m > 5, and m is also an upper bound for CP. By the same reasoning CP has 
upper bounds r such that r is a dyadic fraction with arbitrarily large denominator 
2 n . 
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Since T is nonempty, the same reasoning shows that there are dyadic fractions s 
with large denominators such that s is not an upper bound for T. 

We assert that the least upper bound for CP is the equivalence class Q of the 
following Cauchy sequence (go? (Zi> (Z2> • • •)• 

(a) go is the smallest integer such that go is an upper bound for T. 

(b) q\ is the smallest fraction with denominator 2 such that g! is an upper bound 
for CP. 

(c) g 2 is the smallest fraction with denominator 4 such that qy is an upper bound 
for CP. 

(d) . . . 

(e) q n is the smallest fraction with denominator 2 n such that q F is an upper bound 
for y. 

The sequence (q n ) is well defined because some but not all dyadic fractions with 
denominator 2 n are upper bounds for T. By construction (q n ) is monotone decreasing 
and q n -i — q n < l/2 n . Thus, if m < n then 


0 T q m Qn — Qm gm+1 T gm+1 Qm +2 T 

1 


T qn — 1 Qn 


< 


2^+1 


+ 


1 1 

+ — < 


m 


)m 


It follows that (q n ) is Cauchy and Q — [(qn)] G 

Suppose that Q is not an upper bound for T. Then there is some P — [(p n ) 
with Q ■+ P. By (1), there is an e > 0 and an N such that for all n > TV, 


g y 


q N + e < p n . 

It follows that gjy -< P, a contradiction to gw being an upper bound for CP. 

On the other hand suppose there is a smaller upper bound for T, say R — (r n ) -< 
Q. By (1) there are e > 0 and N such that for all m, n > N, 


^ m T e <t q n . 

Fix a k > N with l/2 k < e. Then for all m > N, 

1 

T‘ni "G Qk £ ^ Qk 2^ 


By (1), iT — < g^ — 1/2^. Since R is an upper bound for T, so is g& — l/2 fc , a contradic- 
tion to qk being the smallest fraction with denominator 2 k such that g& is an upper 
bound for T. Therefore, Q is indeed a least upper bound for T. 
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This completes the verification that the Cauchy completion of Q is a complete 
ordered held. Uniqueness implies that it is isomorphic to the complete ordered held R 
constructed by means of Dedekind cuts in Section 2 of Chapter 1. Decide for yourself 
which of the two constructions of the real number system you like better - cuts 
or Cauchy sequences. Cuts make least upper bounds straightforward and algebra 
awkward, while with Cauchy sequences it is the reverse. 
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Exercises 

1. An ant walks on the floor, ceiling, and walls of a cubical room. What metric 
is natural for the ant’s view of its world? What metric would a spider consider 
natural? If the ant wants to walk from a point p to a point g, how could it 
determine the shortest path? 

2. Why is the sum metric on R 2 called the Manhattan metric and the taxicab 
metric? 

3. What is the set of points in R 3 at distance exactly 1/2 from the unit circle S 1 
in the plane, 

T — {p G R 3 : 3q G S 1 and d(p,q) = 1/2 

and for all q G S 1 we have d(p, q) < d(p, </)}? 

4. Write out a proof that the discrete metric on a set M is actually a metric. 

5. For p, q G S' 1 , the unit circle in the plane, let 

d a {p,q) = min{|/(p) - Z(g)| , 2tt - |/(p) - Z(<?) |} 

where Z(z) G [0, 2tt) refers to the angle that z makes with the positive x-axis. 
Use your geometric talent to prove that d a is a metric on S 1 . 

6. For p, q G [0, tt/ 2) let 

d s (p,q) = sin \p - q\ . 

Use your calculus talent to decide whether d s is a metric. 

7. Prove that every convergent sequence (p n ) in a metric space M is bounded, i.e., 
that for some r > 0, some q G M, and all n G N, we have p n G M r q. 

8. Consider a sequence ( x n ) in the metric space R. 

(a) If ( x n ) converges in R prove that the sequence of absolute values ( 
converges in R. 

(b) State the converse. 

(c) Prove or disprove it. 

9. A sequence ( x n ) in R increases if n < m implies x n < x m . It strictly in- 
creases if n < m implies x n < x m . It decreases or strictly decreases if 
n < m always implies x n > x m or always implies x n > x m . A sequence is 
monotone if it increases or it decreases. Prove that every sequence in R which 
is monotone and bounded converges in R.t 

10. Prove that the least upper bound property is equivalent to the “monotone 
sequence property” that every bounded monotone sequence converges. 

Mhis is nicely is expressed by Pierre Teilhard de Chardin, “ Tout ce qui monte converge in a 
different context. 
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11. Let (x n ) be a sequence in R. 

*(a) Prove that (x n ) has a monotone subsequence. 

(b) How can you deduce that every bounded sequence in R has a convergent 
subsequence? 

(c) Infer that you have a second proof of the Bolzano- Weierstrass Theorem in 

R. 

(d) What about the Heine-Borel Theorem? 

12. Let (p n ) be a sequence and / : N — > N be a bijection. The sequence (qk)keN 
with qk — Pf(k) is a rearrangement of (p n ). 

(a) Are limits of a sequence unaffected by rearrangement? 

(b) What if / is an injection? 

(c) A surjection? 

13. Assume that / : M — >> N is a function from one metric space to another which 
satisfies the following condition: If a sequence (p n ) in M converges then the 
sequence ( f(p n )) in N converges. Prove that / is continuous. [This result 
improves Theorem 4. 

14. The simplest type of mapping from one metric space to another is an isometry. 
It is a bijection / : M — > N that preserves distance in the sense that for all 

we have 

dN(fp, fq ) = d M (p,q )• 

If there exists an isometry from M to N then M and N are said to be isometric, 
M = N. You might have two copies of a unit equilateral triangle, one centered 
at the origin and one centered elsewhere. They are isometric. Isometric metric 
spaces are indistinguishable as metric spaces. 

(a) Prove that every isometry is continuous. 

(b) Prove that every isometry is a homeomorphism. 

(c) Prove that [0, 1] is not isometric to [0, 2]. 

15. Prove that isometry is an equivalence relation: If M is isometric to TV, show 
that N is isometric to M ; show that each M is isometric to itself (what mapping 
of M to M is an isometry?); if M is isometric to N and N is isometric to P, 
show that M is isometric to P. 

16. Is the perimeter of a square isometric to the circle? Homeomorphic? Explain. 

17. Which capital letters of the Roman alphabet are homeomorphic? Are any 
isometric? Explain. 

18. Is R homeomorphic to Q? Explain. 

19. Is Q homeomorphic to N? Explain. 

20. What function (given by a formula) is a homeomorphism from ( — 1,1) to R? Is 
every open interval homeomorphic to (0, 1)? Why or why not? 

21. Is the plane minus four points on the x-axis homeomorphic to the plane minus 
four points in an arbitrary configuration? 
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22. 

23. 

24. 

25. 

26. 

27. 

28. 


29. 

30. 

31. 



If every closed and bounded subset of a metric space M is compact, does it 
follow that M is complete? (Proof or counterexample.) 

( 0 , 1 ) is an open subset of R but not of R 2 , when we think of R as the x-axis in 
R 2 . Prove this. 

For which intervals [a, b] in R is the intersection [a, b] n Q a clopen subset of the 
metric space Q? 

Prove directly from the definition of closed set that every singleton subset of a 
metric space M is a closed subset of M . Why does this imply that every finite 
set of points is also a closed set? 

Prove that a set U C M is open if and only if none of its points are limits of 
its complement. 

If S', T C M, a metric space, and S C T, prove that 

(a) S C T. 

(b) int(S) C int(T). 

A map / : M — >► N is open if for each open set U C M, the image set f(U) is 
open in N. 

(a) If / is open, is it continuous? 

(b) If / is a homeomorphism, is it open? 

(c) If / is an open, continuous bijection, is it a homeomorphism? 

(d) If / : R — > R is a continuous surjection, must it be open? 

(e) If / : R — >> R is a continuous, open surjection, must it be a homeomor- 
phism? 

(f) What happens in (e) if R is replaced by the unit circle S' 1 ? 

Let T be the collection of open subsets of a metric space M, and X the collection 
of closed subsets. Show that there is a bijection from T onto X. 

Consider a two-point set M — {a, b} whose topology consists of the two sets, 
M and the empty set. Why does this topology not arise from a metric on Ml 
Prove the following. 

(a) If U is an open subset of R then it consists of countably many disjoint 
intervals U = |J U{. (Unbounded intervals (— oo , 6 ), (a, oo), and (—00,00) 
are permitted.) 

(b) Prove that these intervals U{ are uniquely determined by U. In other 
words, there is only one way to express U as a disjoint union of open 
intervals. 

(c) If [/, V C R are both open, so U — U U{ and V = |J Vj where U{ and Vj 
are open intervals, show that U and V are homeomorphic if and only if 
there are equally many U{ and Vj. 

Show that every subset of N is clopen. What does this tell you about every 
function f : N M, where M is a metric space? 
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33. 


34. 

35. 

36. 

37. 

38. 


39. 



41. 

42. 


43. 

44. 


(a) Find a metric space in which the boundary of M r p is not equal to the 
sphere of radius r at p, d(M r p ) ^ {x G M : d(x,p) = r}. 

(b) Need the boundary be contained in the sphere? 

Use the Inheritance Principle to prove Corollary 15. 

Prove that S clusters at p if and only if for each r > 0 there is a point q G 
M r p n S', such that q ^ p. 

Construct a set with exactly three cluster points. 

Construct a function / : R — x R that is continuous only at points of Z. 

Let X,Y be metric spaces with metrics dx,dy, and let M — X x Y be their 
Cartesian product. Prove that the three natural metrics d#, d max , and d sum on 
M are actually metrics. [Hint: Cauchy- Schwarz. 

(a) Prove that every convergent sequence is bounded. That is, if (p n ) con- 
verges in the metric space M, prove that there is some neighborhood M r q 
containing the set {p n : n G N}. 

(b) Is the same true for a Cauchy sequence in an incomplete metric space? 
Let M be a metric space with metric d. Prove that the following are equivalent. 

(a) M is homeomorphic to M equipped with the discrete metric. 

(b) Every function / : M — x M is continuous. 

(c) Every bijection g : M — x M is a homeomorphism. 

(d) M has no cluster points. 

(e) Every subset of M is clopen. 

(f) Every compact subset of M is finite. 


Let || || be any norm on R m and let B — {x E R m : ||x|| < 1}. Prove that B is 

compact. [Hint: It suffices to show that B is closed and bounded with respect 
to the Euclidean metric.] 

What is wrong with the following “proof” of Theorem 28? “Let ((a n ,6 n )) be 
any sequence in A x B where A and B are compact. Compactness implies the 
existence of subsequences (a nk ) and (b nk ) converging to a G A and b G B as 
k — > oo. Therefore ((a nfe , b nk )) is a subsequence of ((a n ,b n )) that converges to 
a limit in 4 x B, proving that A x B is compact.” 

Assume that the Cartesian product of two nonempty sets A C M and B C N 
is compact in M x N. Prove that A and B are compact. 

Consider a function / : M — x R. Its graph is the set 


{(p,y) G M xR:y = fp}. 

(a) Prove that if / is continuous then its graph is closed (as a subset ofMxR). 

(b) Prove that if / is continuous and M is compact then its graph is compact. 

(c) Prove that if the graph of / is compact then / is continuous. 

(d) What if the graph is merely closed? Give an example of a discontinuous 
function / : R —x R whose graph is closed. 


Exercises 


A Taste of Topology 


129 


45. Draw a Cantor set C on the circle and consider the set A of all chords between 
points of C. 

(a) Prove that A is compact. 

*(b) Is A convex? 

46. Assume that A,B are compact, disjoint, nonempty subsets of M. Prove that 
there are ao G A and bo G B such that for all a G A and b G B we have 

d(ao, bo) < d(a, b). 

[The points ao,5o are closest together.] 

47. Suppose that 4,BcR 2 . 

(a) If A and B are homeomorphic, are their complements homeomorphic? 
*(b) What if A and B are compact? 

***(c) What if A and B are compact and connected? 

48. Prove that there is an embedding of the line as a closed subset of the plane, 
and there is an embedding of the line as a bounded subset of the plane, but 
there is no embedding of the line as a closed and bounded subset of the plane. 

*49. Construct a subset A C R and a continuous bijection / : A — > A that is not a 
homeomorphism. [Hint: By Theorem 36 A must be noncompact.] 

**50. Construct nonhomeomorphic connected, closed subsets A, B C R 2 for which 
there exist continuous bijections / : A — > B and g : B A. [Hint: By 
Theorem 36 A and B must be noncompact.] 

***51. Do there exist nonhomeomorphic closed sets A,B C R for which there exist 
continuous bijections / : A B and g : B — > A? 

52. Let (A n ) be a nested decreasing sequence of nonempty closed sets in the metric 
space M. 

(a) If M is complete and diam A n — > 0 as n oo, show that A n is exactly 
one point. 

(b) To what assertions do the sets [n, oo) provide counterexamples? 

53. Suppose that (K n ) is a nested sequence of compact nonempty sets, K\ D K2 D 
. . ., and K — K n . If for some fi > 0, diam K n > /1 for all n, is it true that 
diam K > /i? 

54. If / : A — > B and g : C — > B such that A C C and for each a G A we have 
/(a) = g(a) then g extends /. We also say that / extends to g. Assume that 
f:S — > R is a uniformly continuous function defined on a subset S' of a metric 
space M. 

(a) Prove that / extends to a uniformly continuous function / : S — > R. 

(b) Prove that / is the unique continuous extension of / to a function defined 
on S. 

(c) Prove the same things when R is replaced with a complete metric space 

N. 
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56. 

57. 

58. 


*59. 


60. 


61. 

62. 

63. 


* 


64. 



66 . 

67. 


The distance from a point p in a metric space M to a nonempty subset S C M 
is defined to be dist(p, S) — inf {d(p, s) : s G S}. 

(a) Show that p is a limit of S if and only if dist(p, S) — 0. 

(b) Show that p dist(p, S) is a uniformly continuous function of p G M. 
Prove that the 2-sphere is not homeomorphic to the plane. 

If S is connected, is the interior of S connected? Prove this or give a counterex- 
ample. 

Theorem 49 states that the closure of a connected set is connected. 

(a) Is the closure of a disconnected set disconnected? 

(b) What about the interior of a disconnected set? 

Prove that every countable metric space (not empty and not a singleton) is 
disconnected. [Astonishingly, there exists a countable topological space which 
is connected. Its topology does not arise from a metric.] 

(a) Prove that a continuous function / : M -G R, all of whose values are 
integers, is constant provided that M is connected. 

(b) What if all the values are irrational? 

Prove that the (double) cone {(x, y, z) G R 3 : x 2 + y 2 — z 2 } is path-connected. 
Prove that the annulus A = {z G R 2 : r < |z| < R} is connected. 

A subset E of R m is starlike if it contains a point po (called a center for E) 
such that for each gG£, the segment between po and q lies in E. 

(a) If E is convex and nonempty prove that it is starlike. 

(b) Why is the converse false? 

(c) Is every starlike set connected? 

(d) Is every connected set starlike? Why or why not? 

Suppose that E C R m is open, bounded, and starlike, and po is a center for E. 

(a) Is it true or false that all points p\ in a small enough neighborhood of po 
are also centers for El 

(b) Is the set of centers convex? 

(c) Is it closed as a subset of El 

(d) Can it consist of a single point? 

Suppose that A, B C R 2 are convex, closed, and have nonempty interiors. 

(a) Prove that A, B are the closure of their interiors. 

(b) If A, B are compact, prove that they are homeomorphic. 

[Hint: Draw a picture.] 

(a) Prove that every connected open subset of R m is path-connected. 

(b) Is the same true for open connected subsets of the circle? 

(c) What about connected nonopen subsets of the circle? 

List the convex subsets of R up to homeomorphism. How many are there and 
how many are compact? 
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68. List the closed convex sets in R 2 up to homeomorphism. There are nine. How 
many are compact? 

*69. Generalize Exercises 65 and 68 to R 3 ; to R m . 

70. Prove that (a, b ) and [a, b ) are not homeomorphic metric spaces. 

71. Let M and N be nonempty metric spaces. 

(a) If M and N are connected prove that M x IV is connected. 

(b) What about the converse? 

(c) Answer the questions again for path-connectedness. 

72. Let H be the hyperbola {(x,y) G R 2 : xy — 1 and x,y > 0} and let X be the 
x-axis. 

(a) Is the set S — X U H connected? 

(b) What if we replace H with the graph G of any continuous positive function 
/ : R -G (0, oo); is X U G connected? 

(c) What if / is everywhere positive but discontinuous at just one point. 

73. Is the disc minus a countable set of points connected? Path-connected? What 
about the sphere or the toms instead of the disc? 

74. Let S — R 2 \ Q 2 . (Points (x,y) G S have at least one irrational coordinate.) Is 
S connected? Path-connected? Prove or disprove. 

*75. An arc is a path with no self-intersection. Define the concept of arc-connectedness 
and prove that a metric space is path-connected if and only if it is arc-connected. 

76. (a) The intersection of connected sets need not be connected. Give an exam- 

ple. 

(b) Suppose that Si,S 2,S3,... i s a sequence of connected, closed subsets of 
the plane and Si D S 2 D . . .. Is S = f \ S n connected? Give a proof or 
counterexample. 

*(c) Does the answer change if the sets are compact? 

(d) What is the situation for a nested decreasing sequence of compact path- 
connected sets? 

77. If a metric space M is the union of path-connected sets S ai all of which have 
the nonempty path-connected set K in common, is M path-connected? 

78. (pi, . . . ,p n ) is an e-chain in a metric space M if for each i we have pi G M and 
d(pi,Pi+ 1 ) < e. The metric space is chain-connected if for each e > 0 and 
each pair of points p, q G M there is an e-chain from p to q. 

(a) Show that every connected metric space is chain-connected. 

(b) Show that if M is compact and chain-connected then it is connected. 

(c) Is R \ Z chain-connected? 

(d) If M is complete and chain-connected, is it connected? 

79. Prove that if M is nonempty, compact, locally path-connected, and connected 
then it is path-connected. (See Exercise 143, below.) 
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80. The Hawaiian earring is the union of circles of radius 1/n and center x — 
±1 /n on the x-axis, for n G N. See Figure 27 on page 58. 

(a) Is it connected? 

(b) Path-connected? 

(c) Is it homeomorphic to the one-sided Hawaiian earring? 

*81. The topologist’s sine curve is the set 

{(x,y) : x — 0 and \y\ < 1 or 0 < x < 1 and y — sinl/x}. 

See Figure 43. The topologist’s sine circle is shown in Figure 58. (It is the 
union of a circular arc and the topologist’s sine curve.) Prove that it is path- 
connected but not locally path-connected. (M is locally path-connected 
if for each p G M and each neighborhood U of p there is a path-connected 
subneighborhood V of p.) 



Figure 58 The topologist’s sine circle 
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84. 

85. 

86 . 


87. 


88 . 

89. 


90. 

91. 


The graph of / : M R is the set {(x, y) G M x R : y — fx}. 

(a) If M is connected and / is continuous, prove that the graph of / is con- 
nected. 

(b) Give an example to show that the converse is false. 

(c) If M is path-connected and / is continuous, show that the graph is path- 
connected. 

(d) What about the converse? 

The open cylinder is (0, 1) x S 1 . The punctured plane is R 2 Mo}. 

(a) Prove that the open cylinder is homeomorphic to the punctured plane. 

(b) Prove that the open cylinder, the double cone, and the plane are not 
homeomorphic . 

Is the closed strip {(x,y) G R 2 : 0 < x < 1} homeomorphic to the closed 
half-plane {(x,y) G R 2 : x > 0}? Prove or disprove. 

Suppose that M is compact and that If is an open covering of M which is 
“redundant” in the sense that each p G Mis contained in at least two members 
of II. Show that U reduces to a finite subcovering with the same property. 
Suppose that every open covering of M has a positive Lebesgue number. Give 
an example of such an M that is not compact. 


Exercises 87-94 treat the basic theorems in the chapter, avoiding the use of 
sequences. The proofs will remain valid in general topological spaces. 


Give a direct proof that [a, b] is covering compact. [Hint: Let U be an open 
covering of [a, b } and consider the set 


C — {x G [a, b } : finitely many members of U cover [a,x]}. 


Use the least upper bound principle to show that b G C] 

Give a direct proof that a closed subset A of a covering compact set K is covering 
compact. [Hint: If 'll is an open covering of A, adjoin the set W — M \ A to U. 
Is W = II U {W} an open covering of K1 If so, so what?] 

Give a proof of Theorem 36 using open coverings. That is, assume A is a 
covering compact subset of M and / : M — > N is continuous. Prove directly 
that /A is covering compact. [Hint: What is the criterion for continuity in 
terms of preimages?] 

Suppose that / : M — > N is a continuous bijection and M is covering compact. 
Prove directly that / is a homeomorphism. 

Suppose that M is covering compact and that / : M — > N is continuous. Use 
the Lebesgue number lemma to prove that / is uniformly continuous. [Hint: 
Consider the covering of N by e/2-neighborhoods {A r e / 2 ( < ?) : Q. £ Al} and its 
preimage in M, {/ pre (^V e/ / 2 (^)) : Q £ AT}.] 
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92. Give a direct proof that the nested decreasing intersection of nonempty covering 
compact sets is nonempty. [Hint: If A\ D A 2 D ... are covering compact, 
consider the open sets U n — A c n . If A n — 0, what does {U n } cover?] 

93. Generalize Exercise 92 as follows. Suppose that M is covering compact and 6 
is a collection of closed subsets of M such that every intersection of finitely 
many members of 6 is nonempty. (Such a collection 6 is said to have the 
finite intersection property.) Prove that the grand intersection flcee C 
is nonempty. [Hint: Consider the collection of open sets It = {C c : C G 6 .] 

94. If every collection of closed subsets of M which has the finite intersection prop- 
erty also has a nonempty grand intersection, prove that M is covering compact. 
[Hint: Given an open covering It = { U a }, consider the collection of closed sets 

e - {u c a }.} 

95. Let S' be a subset of a metric space M. With respect to the definitions on 
page 92 prove the following. 

(a) The closure of S is the intersection of all closed subsets of M that contain 
S. 

(b) The interior of S is the union of all open subsets of M that are contained 
in S. 

(c) The boundary of S is a closed set. 

(d) Why does (a) imply the closure of S equals lim S? 

(e) If S is clopen, what is dS? 

(f) Give an example of S C R such that d (dS) 7 ^ 0, and infer that “the 
boundary of the boundary d o d is not always zero.” 

96. If A C B C (7, A is dense in £>, and B is dense in C prove that A is dense in C. 

97. Is the set of dyadic rationals (the denominators are powers of 2 ) dense in Q? 
In R? Does one answer imply the other? (Recall that A is dense in B if A C B 
and A D B.) 

98. Show that S C M is somewhere dense in M if and only if int(S') 7 ^ 0. Equiva- 
lently, S is nowhere dense in M if and only if its closure has empty interior. 

99. Let M, N be nonempty metric spaces and P — M x N. 

(a) If M, N are perfect prove that P is perfect. 

(b) If M, N are totally disconnected prove that P is totally disconnected. 

(c) What about the converses? 

(d) Infer that the Cartesian product of Cantor spaces is a Cantor space. (We 
already know that the Cartesian product of compacts is compact.) 

(e) Why does this imply that C x C — {(x,y) G R 2 : x G C and y G C} is 
homeomorphic to C, C being the standard Cantor set? 

100. Prove that every Cantor piece is a Cantor space. (Recall that M is a Cantor 
space if it is compact, nonempty, totally disconnected and perfect, and that 
A C M is a Cantor piece if it is nonempty and clopen.) 
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*101. Let E be the set of all infinite sequences of zeroes and ones. For example, 
(100111000011111 . . .) G E. Define the metric 

7 / 7 \ \^n bn 

d(a, b) = ^ — 

where a — (a n ) and b — (b n ) are points in E. 

(a) Prove that E is compact. 

(b) Prove that S is homeomorphic to the Cantor set. 

102. Prove that no Peano curve is one-to-one. (Recall that a Peano curve is a 
continuous map / : [0, 1] — > R 2 whose image has a nonempty interior.) 

103. Prove that there is a continuous surjection R -G R 2 . What about R m ? 

104. Find two nonhomeomorphic compact subsets of R whose complements are 
homeomorphic . 

105. As on page 115, consider the subsets of R, 


A = {0} U [1, 2] U {3} and B = {0} U {1} U [2, 3]. 


* 


(a) Why is there no ambient homeomorphism of R to itself that carries A onto 
B1 

(b) Thinking of R as the x-axis, is there an ambient homeomorphism of R 2 to 
itself that carries A onto B1 

106. Prove that the completion of a metric space is unique in the following natural 
sense: A completion of a metric space M is a complete metric X space contain- 
ing M as a metric subspace such that M is dense in X. That is, every point of 
A is a limit of M. 

(a) Prove that M is dense in the completion M constructed in the proof of 
Theorem 80. 

(b) If X and X' are two completions of M prove that there is an isometry 
i : X — > X' such that i(p) — p for all p G M . 

(c) Prove that i is the unique such isometry. 

(d) Infer that M is unique. 

107. If M is a metric subspace of a complete metric space S prove that M is a 
completion of M. 

108. Consider the identity map id : C max — > Cmt where Cmax is the metric space 
C([0, 1], R) of continuous real- valued functions defined on [0, 1], equipped with 
the max-metric d mdiX (f^g) = max | f(pc) — g{x) |, and C[ n t is C([ 0, 1] , R) equipped 
with the integral metric, 

dint{f,g) = / \f(x) - g(x)\dx. 

Jo 

Show that id is a continuous linear bijection (an isomorphism) but its inverse 
is not continuous. 
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* 


109. A metric on M is an ultrametric if for all x, y, z G M we have 


d(x, z) < max{d(x, y), d(y , z)}. 

(Intuitively this means that the trip from x to z cannot be broken into shorter 
legs by making a stopover at some y.) 

(a) Show that the ultrametric property implies the triangle inequality. 

(b) In an ultrametric space show that “all triangles are isosceles.” 

(c) Show that a metric space with an ultrametric is totally disconnected. 

(d) Define a metric on the set X of strings of zeroes and ones in Exercise 101 
as 


d*(a, b ) — < 


2 n 

0 


if n is the smallest index for which a n ^ b n 


if a — b. 


* 


110 . 


Show that d* is an ultrametric and prove that the identity map is a home- 
omorphism (X, d) — > (X,d*). 

inherits the Euclidean metric from R but it also carries a very different metric, 
the p-adic metric. Given a prime number p and an integer n, the p-adic norm 
of n is 


n 


p 


P 


k 


where p k is the largest power of p that divides n. (The norm of 0 is by definition 
0.) The more factors of p, the smaller the p-norm. Similarly, if x — a/b is a 
fraction, we factor x as 

k r 
x — p • - 

s 

where p divides neither r nor s, and we set 


x 


p 


p 


k ’ 


The p-adic metric on Q is 


d P (x, y) 


x y\p . 


(a) Prove that d p is a metric with respect to which Q is perfect - every point 
is a cluster point. 

(b) Prove that d p is an ultrametric. 

(c) Let Q p be the metric space completion of Q with respect to the metric d p , 
and observe that the extension of d p to Q p remains an ultrametric. Infer 
from Exercise 109 that Q p is totally disconnected. 
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(d) Prove that Q p is locally compact, in the sense that every point has small 
compact neighborhoods. 

(e) Infer that Q p is covered by neighborhoods homeomorphic to the Cantor 
set. See Gouvea’s book, p-adic Numbers. 

111. Let M — [0, 1] and let Mi be its division into two intervals [0, 1/2] and [1/2, 1]. 
Let M 2 be its division into four intervals [0, 1/4], [1/4, 1/2], [1/2, 3/4], and 
[3/4, 1]. Continuing these bisections generates natural divisions of [0,1]. The 
pieces are intervals. We label them with words using the letters 0 and 1 as 
follows: 0 means “left” and 1 means “right,” so the four intervals in M 2 are 
labeled as 00, 01, 10, and 11 respectively. 

(a) Verify that all endpoints of the intervals (except 0 and 1) have two ad- 
dresses. For instance, 

1 2 fc - 1 + r 
2 7 ¥ _ ' 

(b) Verify that the points 0, 1, and all nonendpoints have unique addresses. 
*112. Prove that #(7 = #R. [Hint: According to the Schroeder-Bernstein Theorem 

from Chapter 1 it suffices to find injections C R and R C. The inclusion 
C C R is an injection C -G R. Each t G [0, 1) has a unique base-2 expansion 
r{t) that does not terminate in an infinite string of ones. Replacing each 1 by 
2 converts r(t) to cj(£), an infinite address in the symbols 0 and 2. It does not 
terminate in an infinite string of twos. Set h{t) — and verify that 

h : [0, 1) — > C is an injection. Since there is an injection R — > [0, 1), conclude 
that there is an injection R — >> C, and hence that # C — #R. 

Remark The Continuum Hypothesis states that if S is any uncountable subset 
of R then S and R have equal cardinality. The preceding coding shows that 
C is not only uncountable (as is implied by Theorem 56) but actually has the 
same cardinality as R. That is, C is not a counterexample to the Continuum 
Hypothesis. The same is true of all uncountable closed subsets of R. See 
Exercise 151. 

113. Let M be the standard Cantor set C. In the notation of Section 8, C n is the 
collection of 2 n Cantor intervals of length l/3 n that nest down to C as n -G 00 . 
Verify that setting G& = C H C k gives divisions of C into disjoint clopen pieces. 
*114. (a) Prove directly that there is a continuous surjection of the middle-thirds 

Cantor set C onto the closed interval [0, 1]. [Hint: Each x G C has a base 
3 expansion (x n ), all of whose entries are zeroes and twos. (For example, 
2/3 = (20) base 3 and 1/3 = (02) ba se3- Write y = (y n ) by replacing the 
twos in (x n ) by ones and interpreting the answer base 2. Show that the 
map x \ -Gy works.] 
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(b) Compare this surjection to the one constructed from the bisection divisions 
in Exercise 113. 

115. Rotate the unit circle S 1 by a fixed angle cp say R : S 1 S 1 . (In polar 
coordinates, the transformation R sends (1, 6) to (1, 6 + a).) 

(a) If a /tv is rational, show that each orbit of R is a finite set. 

*(b) If a /n is irrational, show that each orbit is infinite and has closure equal 
to S 1 . 

116. A metric space M with metric d can always be remet rized so the metric becomes 
bounded. Simply define the bounded metric 


pip, q) 


dip, q) 

1 + d(p, q ) ’ 


(a) Prove that p is a metric. Why is it obviously bounded? 

(b) Prove that the identity map M — > M is a homeomorphism from M with 
the d-metric to M with the p-metric. 

(c) Infer that boundedness of M is not a topological property. 

(d) Find homeomorphic metric spaces, one bounded and the other not. 

117. Fold a piece of paper in half. 

(a) Is this a continuous transformation of one rectangle into another? 

(b) Is it injective? 

(c) Draw an open set in the target rectangle, and find its preimage in the 
original rectangle. Is it open? 

(d) What if the open set meets the crease? 

The baker’s transformation is a similar mapping. A rectangle of dough is 
stretched to twice its length and then folded back on itself. Is the transformation 
continuous? A formula for the baker’s transformation in one variable is f{x) — 
1 — 1 1 — 2x\. The n th iterate of / is f n — f o / o ... o /, n times. The orbit 
of a point x is 

{x, f{x), fix), f n (x), ...}■ 


[For clearer but more awkward notation one can write f on instead of f n . This 
distinguishes composition / o / from multiplication / • /.] 

(e) If x is rational prove that the orbit of x is a finite set. 

(f) if x is irrational what is the orbit? 

*118. The implications of compactness are frequently equivalent to it. Prove 

(a) If every continuous function / : M -f R is bounded then M is compact. 

(b) If every continuous bounded function f : M R achieves a maximum or 
minimum then M is compact. 

(c) If every continuous function f : M R has compact range fM then M 
is compact. 
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(d) If every nested decreasing sequence of nonempty closed subsets of M has 
nonempty intersection then M is compact. 

Together with Theorems 63 and 65, (a)-(d) give seven equivalent definitions of 
compactness. [Hint: Reason contrapositively. If M is not compact then it con- 
tains a sequence (p n ) that has no convergent subsequence. It is fair to assume 
that the points p n are distinct. Find radii r n > 0 such that the neighborhoods 
Mr n (p n ) are disjoint and no sequence q n G M rn (p n ) has a convergent subse- 
quence. Using the metric define a function f n : M Tn (p n ) -G R with a spike at 
p n , such as 

„ , x r n - d(x,p n ) 

fn(X) = —77 r 

d n T ClyX, Pn) 

where a n > 0. Set f(x ) = f n (x ) if x G M rri {p n ), and f(x) = 0 if x belongs to 
no M rn {jp n ). Show that / is continuous. With the right choice of a n show that 
/ is unbounded. With a different choice of a n , it is bounded but achieves no 
maximum, and so on. 

119. Let M be a metric space of diameter < 2. The cone for M is the set 


C = C(M) = {p 0 } U M x (0, 1] 

with the cone metric 


p((p,s), (' q,t )) 
P((p,s), Po) 
P(P0,P0 ) 


s — t\ T min{s, t}d(p, q) 
s 

0 . 


The point po is !h e vertex of the cone. Prove that p is a metric on C. [If M 
is the unit circle, think of it in the plane z = 1 in R 3 centered at the point 
(0, 0, 1). Its cone is the 45-degree cone with vertex the origin. 

120. Recall that if for each embedding of M, h : M — > N , hM is closed in N then 
M is said to be absolutely closed. If each hM is bounded then M is absolutely 
bounded. Theorem 41 implies that compact sets are absolutely closed and 
absolutely bounded. Prove: 

(a) If M is absolutely bounded then M is compact. 

*(b) If M is absolutely closed then M is compact. 

Thus these are two more conditions equivalent to compactness. [Hint: From 
Exercise 118(a), if M is noncompact there is a continuous function f : M R 
that is unbounded. For Exercise 120(a), show that F[x) — (x,f(x)) embeds 
M onto a nonbounded subset of M x R. For 120(b), justify the additional 
assumption that the metric on M is bounded by 2. Then use Exercise 118(b) to 
show that if M is noncompact then there is a continuous function g : M — > (0, 1] 
such that for some nonclnstering sequence (p n ), we have g(p n ) -G 0 as n — > oo. 
Finally, show that G(x) — (x,gx) embeds M onto a nonclosed subset S of the 
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cone C(M ) discussed in Exercise 119. S will be nonclosed because it limits at 
po but does not contain it.] 

121. (a) Prove that every function defined on a discrete metric space is uniformly 

continuous. 

(b) Infer that it is false to assert that if every continuous function / : M — > R 
is uniformly continuous then M is compact. 

(c) Prove, however, that if M is a metric subspace of a compact metric space 
K and every continuous function / : M -G R is uniformly continuous then 
M is compact. 

122. Recall that p is a cluster point of S if each M r p contains infinitely many points 
of S. The set of cluster points of S is denoted as S ' . Prove: 

(a) If S C T then S' C T 7 . 

(b) (S U T)' = S' U T'. 

(c) S' = (S)'. 

(d) S' is closed in M; that is, S" C S' where S" — (S')'. 

(e) Calculate N 7 , Q 7 , R 7 , (R\ Q) 7 , and Q 77 . 

(f) Let T be the set of points {1 jn : n G N}. Calculate T' and T " . 

(g) Give an example showing that S" can be a proper subset of S ' . 

123. Recall that p is a condensation point of S if each M r p contains uncount ably 
many points of S. The set of condensation points of S is denoted as S*. Prove: 

(a) If S C T then S* C T*. 

(b) (S U T)* = S* U T* 

(c) S* C S* where S* = (S)* 

(d) S* is closed in M; that is, S* / C S* where S* / = (S*) 7 . 

(e) S** C S* where S** = (S*)* 

(f) Calculate N* Q* M* and (R\ Q)* 

(g) Give an example showing that S* can be a proper subset of (S)*. Thus, 
(c) is not in general an equality. 

**(h) Give an example that S** can be a proper subset of S*. Thus, (e) is 
not in general an equality. [Hint: Consider the set M of all functions 
/ : [a, b } — > [0, 1], continuous or not, and let the metric on M be the sup 
metric, d(f,g) — sup{|/(x) — g(x)\ : x G [a, 6]}. Consider the set S of all 
“5-functions with rational values.”] 

**(i) Give examples that show in general that S* neither contains nor is con- 
tained in S 7 * where S 7 * = (S 7 )*. [Hint: 5-functions with values 1/n, n G N. 

124. Recall that p is an interior point of S C M if some M r p is contained in S. 
The set of interior points of S is the interior of S and is denoted int S. For all 
subsets S', T of the metric space M prove: 

(a) int S = S \ OS. 

(b) intS = (S c ) c . 
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125. 


* 


126. 


127. 


128. 


129. 

130. 


(c) int (infs') = intS. 

(d) int(S H T) = int(S H int T. 

(e) What are the dual equations for the closure? 

(f) Prove that int(S U T) D int S U int T. Show by example that the inclusion 
can be strict, i.e., not an equality. 

A point p is a boundary point of a set S C M if every neighborhood M r p 
contains points of both S and S c . The boundary of S is denoted dS. For all 
subsets S, T of a metric space M prove: 

(a) S is clopen if and only if dS — 0. 

(b) dS = dS c . 

(c) ddS c dS. 

(d) dddS = ddS. 

(e) d(SUT) c dSUdT. 

(f) Give an example in which (c) is a strict inclusion, ddS ^ dS. 

(g) What about (e)? 

Suppose that E is an uncountable subset of R. Prove that there exists a point 
p <E R at which E condenses. [Hint: Use decimal expansions. Why must there 
be an interval [n, n+1) containing uncountably many points of El Why must it 
contain a decimal subinterval with the same property? (A decimal subinterval 
a, b ) has endpoints a — n+k/ 10, b — n+(k + l)/10 for some digit fc, 0 < k < 9.) 
Do you see lurking the decimal expansion of a condensation point?] Generalize 
to R 2 and to R m . 

The metric space M is separable if it contains a countable dense subset. [Note 
the confusion of language: “Separable” has nothing to do with “separation.”] 

(a) Prove that R ?n is separable. 

(b) Prove that every compact metric space is separable. 

*(a) Prove that every metric subspace of a separable metric space is separable, 
and deduce that every metric subspace of R m or of a compact metric space 
is separable. 

(b) Is the property of being separable topological? 

(c) Is the continuous image of a separable metric space separable? 

Think up a nonseparable metric space. 

Let ® denote the collection of all e-neighborhoods in R m whose radius e is 
rational and whose center has all coordinates rational. 

(a) Prove that ® is countable. 

(b) Prove that every open subset of R m can be expressed as the countable 
union of members of ®. 

(The union need not be disjoint, but it is at most a countable union because 
there are only countably many members of ®. A collection such as S is called 
a countable base for the topology of R m .) 
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131. (a) Prove that every separable metric space has a countable base for its topol- 

ogy, and conversely that every metric space with a countable base for its 
topology is separable. 

(b) Infer that every compact metric space has a countable base for its topology. 

*132. Referring to Exercise 123, assume now that M is separable, S C M, and, as 
before S' is the set of cluster points of S while S* is the set of condensation 
points of S. Prove: 

(a) 5* C {ST = (5)* 

(b) s ** = s *' = s*. 

(c) Why is (a) not in general an equality? 

[Hints: For (a) write S C (S \ S') U S' and S — {S \ S') U S ' , show that 
(S \ S')* — 0, and use Exercise 123(a). For (b), Exercise 123(d) implies that 
S ** C S* f C S'*. To prove that S* C S'**, write S C (S \ S*) U S* and show 
that (S\S*)* = 0.] 

*133. Prove that 

(a) An uncountable subset of R clusters at some point of R. 

(b) An uncountable subset of R clusters at some point of itself. 

(c) An uncountable subset of R condenses at uncountably many points of 
itself. 

(d) What about R m instead of R? 

(e) What about any compact metric space? 

(f) What about any separable metric space? 

[Hint: Review Exercise 126.] 

*134. Prove that Q, the Cauchy sequences in Q modulo the equivalence relation of 
being co-Cauchy, is a held with respect to the natural arithmetic operations 
defined on page 122, and that Q is naturally a subheld of Q. 

135. Prove that the order on Q dehned on page 122 is a bona hde order which agrees 
with the standard order on Q. 

*136. Let M be the square [0, 1] 2 , and let aa, 5a, 55, ab label its four quadrants - upper 
right, upper left, lower left, and lower right. 

(a) Dehne nested bisections of the square using this pattern repeatedly, and let 

be a curve composed of line segments that visit the fc th -order quadrants 
systematically. Let r — lim^ be the resulting Peano curve a la the 
Cantor Surjection Theorem. 

(b) Compare r to the Peano curve / : I I 2 directly constructed on pages 
271- 274 of the second edition of Munkres’ book Topology. 

*137. Let P be a closed perfect subset of a separable complete metric space M . Prove 
that each point of P is a condensation point of P. In symbols, P — P' 

P = P*. 

**138. Given a Cantor space M C R 2 , given a line segment [p, q\ C R 2 with p, g ^ M, 
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and given an e > 0, prove that there exists a path A in the e-neighborhood of 
[p, q\ that joins p to q and is disjoint from M . [Hint: Think of A as a bisector 
of M . From this bisection fact a dyadic disc partition of M can be constructed, 
which leads to the proof that M is tame. 

139. To prove that Antoine’s Necklace A is a Cantor set, you need to show that A 
is compact, perfect, nonempty, and totally disconnected. 

(a) Do so. [Hint: What is the diameter of any connected component of A n , 
and what does that imply about A?] 

**(b) If, in the Antoine construction two linked solid tori are placed very cleverly 
inside each larger solid torus, show that the intersection A — [\ A n is a 
Cantor set. 

*140. Consider the Hilbert cube 


H — {(xi, £2, • • •) C [0, 1]°° : for each n E N we have \x n \ < l/2 n } 


Prove that H is compact with respect to the metric 


d(x,y )) = sup \x n - y n \ 


n 


where x — (x n ), y — (y n )- [Hint: Sequences of sequences. 

Remark Although compact, H is infinite-dimensional and is homeomorphic 
to no subset of R m . 


141. Prove that the Hilbert cube is perfect and homeomorphic to its Cartesian 
square, H = H x H. 

***142. Assume that M is compact, nonempty, perfect, and homeomorphic to its Carte- 
sian square, M = M x M . Must M be homeomorphic to the Cantor set, the 
Hilbert cube, or some combination of them? 

143. A Peano space is a metric space M that is the continuous image of the unit 
interval: There is a continuous surjection r : [0, 1] -0- M. Theorem 72 states the 
amazing fact that the 2-disc is a Peano space. Prove that every Peano space is 

(a) compact, 

(b) nonempty, 

(c) path-connected, 

*(d) and locally path-connected, in the sense that for each p E M and each 
neighborhood U of p there is a smaller neighborhood V of p such that any 
two points of V can be joined by a path in U . 

*144. The converse to Exercise 143 is the Hahn-Mazurkiewicz Theorem. Assume 
that a metric space M is a compact, nonempty, path-connected, and locally 
path-connected. Use the Cantor Surjection Theorem 70 to show that M is a 
Peano space. [The key is to make uniformly short paths to fill in the gaps of 
[0, 1] \ C] 
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145. One of the famous theorems in plane topology is the Jordan Curve Theorem. 
A Jordan curve J is a homeomorph of the unit circle in the plane. (Equiva- 
lently it is /([a, b]) where / : [a, b] — > R 2 is continuous, /(a) = /(ft), and for no 
other pair of distinct s,t G [a, b ] does f(s) equal fit). It is also called a simple 
closed curve.) The Jordan Curve Theorem asserts that R 2 \ J consists of two 
disjoint, connected open sets, its inside and its outside, and every path between 
them must meet J. Prove the Jordan Curve Theorem for the circle, the square, 
the triangle, and - if you have courage - every simple closed polygon. 

146. The utility problem gives three houses 1, 2, 3 in the plane and the three 
utilities, Gas, Water, and Electricity. You are supposed to connect each house 
to the three utilities without crossing utility lines. (The houses and utilities are 
disjoint.) 

(a) Use the Jordan curve theorem to show that there is no solution to the 
utility problem in the plane. 

*(b) Show also that the utility problem cannot be solved on the 2-sphere S 2 . 
*(c) Show that the utility problem can be solved on the surface of the toms. 
*(d) What about the surface of the Klein bottle? 

***(e) Given utilities J7i, ..., U m and houses Hi, . . . , H n located on a surface 
with g handles, find necessary and sufficient conditions on m, n, g so that 
the utility problem can be solved. 

147. Let M be a metric space and let X denote the class of nonempty compact 
subsets of M. The r-neighborhood of A G X is 

M r A = {x G M : 3a G A and d(x, a) < r} — JJ M r a. 

aeA 

For A, B G X define 

D(A, B ) = inf{r >0:4c M r B and B C M r A}. 

(a) Show that D is a metric on X. (It is called the Hausdorff metric and X 
is called the hyperspace of M.) 

(b) Denote by X the collection of finite nonempty subsets of M and prove that 
X is dense in X. That is, given A G X and given e > 0 show there exists 
F e X such that D(A, F ) < e. 

*(c) If M is compact prove that X is compact. 

(d) If M is connected prove that X is connected. 

**(e) If M is path-connected is X path-connected? 

(f) Do homeomorphic metric spaces have homeomorphic hyperspaces? 

Remark The converse to (f), X(M) = X(N) M = TV is false. The 
hyperspace of every Peano space is the Hilbert cube. This is a difficult 
result but a good place to begin reading about hyperspaces is Sam Nadler’s 
book Continuum Theory. 
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**148. Start with a set S C R and successively take its closure, the complement of 
its closure, the closure of that, and so on: S', cl(S), (el(S)) c , . . .. Do the same 
to S c . In total, how many distinct subsets of R can be produced this way? 
In particular decide whether each chain S, cl(S), . . . consists of only finitely 
many sets. For example, if S' = Q then we get Q, R, 0, 0, R, R, . . . and 
Q c , R, 0, 0, R, R, . . . for a total of four sets. 

**149. Consider the letter T. 

(a) Prove that there is no way to place uncountably many copies of the letter 
T disjointly in the plane. [Hint: First prove this when the unit square 
replaces the plane.] 

(b) Prove that there is no way to place uncountably many homeomorphic 
copies of the letter T disjointly in the plane. 

(c) For which other letters of the alphabet is this true? 

(d) Let U be a set in R 3 formed like an umbrella: It is a disc with a perpendic- 
ular segment attached to its center. Prove that uncountably many copies 
of U cannot be placed disjointly in R 3 . 

(e) What if the perpendicular segment is attached to the boundary of the 
disc? 

**150. Let M be a complete, separable metric space such as R m . Prove the Cupcake 
Theorem: Each closed set K C M can be expressed uniquely as the disjoint 
union of a countable set and a perfect closed set, 

CUP = K. 

**151. Let M be an uncountable compact metric space. 

(a) Prove that M contains a homeomorphic copy of the Cantor set. [Hint: 
Imitate the construction of the standard Cantor set C . 

(b) Infer that Cantor sets are ubiquitous. There is a continuous surjection 

a : C — > M and there is a continuous injection i : C M . 

(c) Infer that every uncountable closed set S C R has — #R, and hence 

that the Continuum Hypothesis is valid for closed sets in R. [Hint: Cup- 
cake and Exercise 112.] 

(d) Is the same true if M is separable, uncountable, and complete? 

**152. Write jingles at least as good as the following. Pay attention to the meter as 
well as the rhyme. 

When a set in the plane 
is closed and bounded, 
you can always draw 
a curve around it. 


Peter Pribik 
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If a clopen set can be detected, 

Your metric space is disconnected. 

David Owens 


A coffee cup feeling quite dazed, 
said to a donut, amazed, 
an open surjective continuous injection, 
You’d be plastic and I’d be glazed. 

Nor ah Esty 


’Tis a most indisputable fact 

If you want to make something compact 

Make it bounded and closed 

For you’re totally hosed 

If either condition you lack. 

Lest the reader infer an untruth 

(Which I think would be highly uncouth) 

I must hasten to add 

There are sets to be had 

Where the converse is false, fo’sooth. 

Karla Westfahl 


For ev’ry a and b in S 

if there exists a path that’s straight 

from a to b and it’s inside 

then U S must be convex,” we state. 


Alex Wang 
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Prelim Problems^ 

1. Suppose that / : R m -0- R satisfies two conditions: 

(i) For each compact set K , f(K) is compact. 

(ii) For every nested decreasing sequence of compacts (K n ), 

f (n K n) = n f(K n ). 

Prove that / is continuous. 

2. Let X C R m be compact and f : X R be continuous. Given e > 0, show 
that there is a constant M such that for all x, y G X we have | f(x) — /(y)| < 
M\x — y\ + e. 

3. Consider / : R 2 -G R. Assume that for each fixed xq , y i-g /(xo, y) is continuous 

and for each hxed yo, # > /(x,yo) is continuous. Find such an / that is not 

continuous. 

4. Let / : R 2 -G R satisfy the following properties. For each fixed xo G R the 
function y i-g /(xo , y) is continuous and for each fixed yo G R the function 
x /(x, yo) is continuous. Also assume that if K is any compact subset of R 2 
then f(K) is compact. Prove that / is continuous. 

5. Let /(x,y) be a continuous real-valued function defined on the unit square 
[0, 1] x [0, 1]. Prove that 

y(x) = max{/(x, y) : y G [0, 1]} 


is continuous. 

6. Let {Uk} be a cover of R m by open sets. Prove that there is a cover {V&} of R m 
by open sets such that V& C Uk and each compact subset of R m is disjoint 
from all but finitely many of the V&. 

7. A function / : [0, 1] — > R is said to be upper semicontinuous if given x G [0,1] 
and c 0 there exists a S 0 such that y x S implies that f(yf) ^ / O) +e- 
Prove that an upper semicontinuous function on [0, 1] is bounded above and 
attains its maximum value at some point p G [0, 1]. 

8. Prove that a continuous function / : R — > R which sends open sets to open sets 
must be monotonic. 

9. Show that [0, 1] cannot be written as a countably infinite union of disjoint closed 
subintervals. 

10. A connected component of a metric space M is a maximal connected subset 
of M . Give an example of M C R having uncount ably many connected com- 
ponents. Can such a subset be open? Closed? Does your answer change if R 2 
replaces R? 


^ These are questions taken from the exam given to first-year math graduate students at U.C. 
Berkeley. 
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12. 

13. 


14. 

15. 


Let U C R m be an open set. Suppose that the map h : U — > R m is a homeo- 
morphism from U onto R m which is uniformly continuous. Prove that U — R m . 
Let A be a nonempty connected set of real numbers. If every element of X is 
rational prove that X has only one element. 

Let A C R m be compact, x G A. Let (x n ) be a sequence in A such that every 
convergent subsequence of (x n ) converges to x. 

(a) Prove that the sequence (x n ) converges. 

(b) Give an example to show if A is not compact, the result in (a) is not 
necessarily true. 

Assume that / : R — > R is uniformly continuous. Prove that there are constants 
A, B such that \f(x)\ < A + B\x\ for all x G R. 

Let h : [0, 1) — > R be a uniformly continuous function where [0, 1) is the half- 
open interval. Prove that there is a unique continuous map g : [0, 1] — > R such 
that g(x) — h(x) for all x G [0,1). 


3 

Functions of a Real Variable 


1 Differentiation 

The function / : (a, b) -A R is differentiable at x if 



lim 

t—fX 


f_{t) - /(a) 

t — X 



exists. This means L is a real number and for each e > 0 there exists a 5 > 0 such 
that if 0 < \t — x\ <5 then the differential quotient above differs from L by < e. 
The limit L is the derivative of / at x, l = r (x). In calculus language, Ax — t — x 
is the change in the independent variable x while A / = f(t) — /(x) is the resulting 
change in the dependent variable y — /(x). Differentiability at x means that 

fix) — lim 
v y Ax— ^0 Ax 


We begin by reviewing the proofs of some standard calculus facts. 


1 The Rules of Differentiation 


(a) Differentiability implies continuity. 

(b) If f and g are differentiable at x then so is f + g, the derivative being 

(/ + 9)'{x) = f'(x)+g'{x). 

(c) If f and g are differentiable at x then so is their product f • g, the derivative 
being given by the Leibniz Formula 

(/ • g)'(x) = f'(x) ■ g(x) + f(x) ■ g\x). 
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(d) The derivative of a constant is zero , c' = 0. 

(e) If f and g are differentiable at x and g{x) 0 then their ratio f /g is differen- 
tiable at x, the derivative being 


({)’ w = 


f(x) ■ g(x) - f(x) • g'{x) 
g(x) 2 


(f) If f is differentiable at x and g is differentiable aty — /(x) then their composite 
g o / is differentiable at x, the derivative being given as the Chain Rule 

0° f)'(x) = g \y) ■ f'(x). 


Proof (a) Continuity in the calculus notation amounts to the assertion that A / — > 0 
as Ax — > 0. This is obvious: If the fraction Af /Ax tends to a finite limit while its 
denominator tends to zero, then its numerator must also tend to zero. 


(b) Since A (/ + g) = Af + A g, we have 

A(f + g) Af + A g 


Ax 


Ax Ax 


f\x) + g'(x) 


as Ax — > 0. 


(c) Since A (/ • g) — Af • g(x + Ax) + f[x) • A g, continuity of g at x implies that 
A (/ 9 ) A V(£ + Ax) + f{x)f^- ->■ f\x)g(x) + f(x)g\x), 


Ax 


Ax 


Ax 


as Ax — >► 0. 


(d) if c is a constant then Ac = 0 and c r — 0. 

(e) Since 


a (// 5 ) = 


g{x)Af - f{x)Ag 


g{x + Ax)g(x) 

the formula follows when we divide by Ax and take the limit. 

(f) The shortest proof of the chain rule for y — /(x) is by cancellation: 


Ag = A g Ay 
Ax Ay Ax 


g'{y)f'{x). 


A slight flaw is present: Ay may be zero when Ax is not. This is not a big problem. 
Differentiability of g at y implies that 

Ag = 

Ay 


g'{y) + v 
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where a — cr(Ay) — > 0 as Ay — > 0. Define <r(0) = 0. The formula 

A# = ( g'(y ) + a) Ay 


holds for all small Ay , including Ay — 0. Continuity of / at x (which is true by (a)) 
implies that A / — > 0 as Ax — > 0. Thus 


Ax 


(. g'iv ) + GA/)) 


A y 
Ax 


->■ g'(y)f(x) 


as Ax — >► 0. 


□ 


2 Corollary The derivative of a polynomial ao + oqx + • • • + a n x n exists at every 
x G R and equals a\ + 2a2X + • • • + ua n x n ~ x . 

Proof Immediate from the differentiation rules. □ 


A function / : (a, 6) — > R that is differentiable at each x G (a, 6) is differentiable. 


3 Mean Value Theorem A continuous function / : [a, 6] — > R that is differentiable 
on the interval (a, b ) /ias t/ie mean value property: There exists a point 6 G (a, 5) 
swc/i that 

f(b) - f(a ) = /'(/9)0 - «)• 


4 Lemma If / : (a, 6) -G R zs differentiable and achieves a minimum or maximum 
at some 6 G (a, 6) t/zen f'(6 ) = 0. 

Proof Assume that / has a minimum at 0. The derivative f r {0 ) is the limit of the 
differential quotient (/(£) — f(6))/{t — 6) as t — >> 6. Since /(£) > /(0) for all t G (a, 6), 
the differential quotient is nonnegative for t > 9 and nonpositive for t < 9. Thus 
f'(0) is a limit of both nonnegative and nonpositive quantities, so f id) — 0. Similarly 
f r (6 ) — 0 when / has a maximum at 6. □ 


Proof of the Mean Value Theorem See Figure 59, where 


5 = 


/(ft) - /(«) 

b — a 


is the slope of the secant of the graph of /. The function 4>(x) — /(x) — S(x — a) is 
continuous on [a, b] and differentiable on (a, b). It has the same value, namely /(a), 
at x — a and x — b. Since [a, b] is compact <f> takes on maximum and minimum 
values, and since it has the same value at both endpoints, <f> has a maximum or a 
minimum that occurs at an interior point 6 G (a, b). See Figure 60. By Lemma 4 we 
have <j) f (6 ) = 0 and f(b) — /(a) = f{0)(b — a). □ 
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Figure 59 The secant line for the graph of / 





a 0 B h a d b a $ b a 0 b 

Figure 60 fid) = 0. 


5 Corollary If f is differentiable and \ f\x)\ < M for all x G (a, b) then f satisfies 
the global Lipschitz condition - for all £, x G (a, b) we have 


f(t ) - fix) I < M\t - x 


In particular, if fix) — 0 for all x € (a, b) then f{x) is constant. 


Proof |/(t) - /(:r) | = | f(0)(t — x)\ for some 6 between x and t. 


□ 


Remark The Mean Value Theorem and this corollary are the most important tools 
in calculus for making estimates. 


It is often convenient to deal with two functions simultaneously, and for that we 
have the following result. 

6 Ratio Mean Value Theorem Suppose that the functions f and g are continuous 
on an interval [a, b] and differentiable on the interval (a, b). Then there is a 6 G (a, b) 
such that 


A f-g\9) = A g.f\0) 


Section 1 


Differentiation 


153 


where A / = f(b) — /(a) and A g — g(b ) — g{a). (If g(x) = x, the Ratio Mean Value 
Theorem becomes the ordinary Mean Value Theorem.) 


Proof If A g 7 ^ 0 then the theorem states that for some 6 G (a, b ) we have 

a/ _ m 

Ac/ s'(0) ’ 

This ratio expression is how to remember the theorem. The whole point here is that 
f and g' are evaluated at the same 6. The function 

$ 0 ) = A / • (g(x) - g(a )) - A g ■ (f(x) - /(a)) 


is differentiable and its value at both endpoints a, b is 0 . Since <f> is continuous it 
takes on a maximum and a minimum somewhere in the interval [a, b\. Since <f> has 
equal values at the endpoints of the interval, it must take on a maximum or minimum 
at some point 6 G (a, 6 ); i.e., 8 7 ^ a, 5. Then &'(9) = 0 and A / • g'(8) — Ag • /'(0) as 
claimed. □ 


7 L’Hopital’s Rule If f and g are differentiable functions defined on an interval 
(a, b), both of which tend to 0 at b, and if the ratio of their derivatives f'(x)/g'(x ) 
tends to a finite limit L at b then f(x)/g(x) also tends to L at b. (We assume that 
g(x),g'(x) ± 0.) 

Rough Proof Let x G (a, b ) tend to b. Imagine a point t G (a, b ) tending to b 
much faster than x does. It is a kind of “advance guard” for x. Then f(t)/f(x) and 
g(t)/g(x) are as small as we wish, and by the Ratio Mean Value Theorem there is a 
6 G (x, t) such that 


fix) = f{x) - 0 ^ f(x) - f(t) = f(9) 
g(x) g{x) - 0 g{x) - g(t) g'{9) ' 

The latter tends to L because 6 is sandwiched between x and t as they tend to b. 
The symbol = means approximately equal. See Figure 61. □ 


\ lightyear 


a 


I mile 


1 inch 


e 


Figure 61 x and t escort 6 toward b. 
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Complete Proof Given e > 0 we must find 5 > 0 such that if |x — b\ <5 then 
\f(x)/g(x) - L <C c. Since f (x) / g (x) tends to L as x tends to b there does exist 
6 > 0 such that if x G {b — d, b) then 


f'jx) 

g'{x) 


- L 


e 

<2 


For each x G {b — 4, b) determine a point t G (b — S, b) which is so near to b that 

g(x) 2 e 


fit) | + |flV)| < 

I g(t)\ < 


4 (l/(^)l + b(®)l) 

IpO)! 


Since f{t) and g(t ) tend to 0 as t tends to 5, and since g(x) ^ 0 such a t exists. It 
depends on x, of course. By this choice of t and the Ratio Mean Value Theorem we 
have 


/M L 


fix) fix) - fit) ! fix) - f{t) L 

g{x) 


g{x) g(x) - git) g{x) - g{t) 


< g{x)f(t) - f(x)g(t) 
g(x)(g(x) - g(t)) 

which completes the proof that f{x)/g{x) — > L as x 


+ 

m T 

w -l-J 


9 id) 


< e. 


b. 


□ 


It is clear that L’Hopitahs Rule holds equally well as x tends to b or to a. It 
is also true that it holds when x tends to Too or when / and g tend to Too. See 
Exercises 6 and 7. 


From now on feel free to use L'HopitaVs Rule! 

8 Theorem If f is differentiable on (a, 6) then its derivative function f'(x) has the 
intermediate value property. 

Differentiability of / implies continuity of /, and so the Intermediate Value The- 
orem from Chapter 2 applies to / and states that / takes on all intermediate values, 
but this is not what Theorem 8 is about. Not at all. Theorem 8 concerns f not 
/. The function f can well be discontinuous, but nevertheless it too takes on all 
intermediate values. In a clear abuse of language, functions like f possessing the 
intermediate value property are called Darboux continuous, even when they are 
discontinuous! Darboux was the first to realize how badly discontinuous a derivative 


Section 1 


Differentiation 


155 


function can be. Despite the fact that f has the intermediate value property, it can 
be discontinuous at almost every point of [a, b\. Strangely enough, however, f' can- 
not be discontinuous at every point. If / is differentiable, f must be continuous at 
a dense, thick set of points. See Exercise 25 and the next section for the definitions. 

Proof of Theorem 8 Suppose that a < x\ < X2 < b and 


OL = f\x l) < 7 < f(x 2 ) = p. 


We must find 9 G {xi^xf) such that f r { 9 ) — 7- 

Choose a small /i, 0 < h < X2 — 27, and draw the secant segment cr(x) between 
the points (x, f(x)) and [x + h, f(x + h)) on the graph of /. Slide x from x\ to X2 — h 
continuously. This is the sliding secant method. See Figure 62 . 



Figure 62 The sliding secant 


When h is small enough, slope a(x 1) = f'(x 1) and slope a(x 2 — h) = f[x 2). Thus 


slope cr(xi) < 7 < slope cr(x2 — h). 

Continuity of / implies that the slope of a(x) depends continuously on x, so by the 
Intermediate Value Theorem for continuous functions there is an x G ( 27, £2 — h) 
with slope a(x) — 7- The Mean Value Theorem then gives a 8 G (x, x + h) such that 

f\ 6 ) =7. □ 

9 Corollary The derivative of a differentiable function never has a jump disconti- 
nuity. 


Proof Near a jump, a function omits intermediate values. 


□ 
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Pathological Examples 

Nonjump discontinuities of f may very well occur. The function 

P , x , x 2 sin — if x > 0 
f(x) = { X 

0 if x < 0 

is differentiable everywhere, even at x — 0, where f'( 0) = 0. Its derivative function 
for x > 0 is 

1 1 

f'(x) — 2xsin — — cos—, 

X ry 
*AJ 

which oscillates more and more rapidly with amplitude approximately 1 as x — > 0. 
Since f'(x) -/> 0 as x — > 0, f is discontinuous at x — 0. Figure 63 shows why / is 
differentiable at x — 0 and has f'( 0) = 0. Although the graph oscillates wildly at 0, 
it does so between the envelopes y — ±x 2 , and any curve between these envelopes is 
tangent to the x-axis at the origin. Study this example, Figure 63. 




- 0.5 


0 


0.05 


0.1 


Figure 63 The graphs of the function y — x 2 sin(l/x) and its envelopes 

y — ±x 2 ; and the graph of its derivative 

A similar but worse example is illustrated in Figure 64, where 

1 


9 (x) = 


x 3/2 gj n _ if x > 0 
X 

0 if x < 0 


Its derivative at x — 0 is ^(0) = 0, while at x ^ 0 its derivative is 

,/ x 3 . 1 1 1 

a (x) — - \Jx sm — — — — cos — , 

2 ry* / ry* ry* 

T A / T T 


which oscillates with increasing frequency and unbounded amplitude as x 
cause 1/y/x blows up at x — 0. 


0 be- 
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0.04 

0.02 

0 

- 0.02 

-0.04 

0 0,05 OJ 0 0*05 OJ 

Figure 64 The function y — x 3 / 2 sin(l/x), its envelopes y — ±x 3 / 2 , and its 

derivative. 



Higher Derivatives 

The derivative of f, if it exists, is the second derivative of /, 


(/')'(,) = /"(,) = Jim mzm 


Higher derivatives are defined inductively and written — (/( r 1 ^ ) ) / . If f( r \x) 
exists then / is r th - order differentiable at x. If f( r \x) exists for each x G (a, b) 
then / is r th - order differentiable. If f( r \x) exists for all r and all x then / 
is infinitely differentiable or smooth. The zeroth derivative of / is / itself, 

/( 0 ) (V) = f( x )- 

10 Theorem If f is r th -order differentiable and r > 1 then f( r x )( x ) is a continuous 
function of x G (a, 6). 


Proof Differentiability implies continuity and f( r ^ (x) is differentiable. 


□ 


11 Corollary A smooth function is continuous. Each derivative of a smooth func- 
tion is smooth and hence continuous. 


Proof Obvious from the definition of smoothness and Theorem 10. 


□ 


Smoothness Classes 


If / is differentiable and its derivative function f'(x) is a continuous function of x 
then / is continuously differentiable and we say that / is of class C 1 . If / is re- 
order differentiable and /( r )(x) is a continuous function of x then / is continuously 
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r th - order differentiable and we say that / is of class C r . If / is smooth then by 
the preceding corollary it is of class C r for all finite r and we say that / is of class 
C°°. To round out the notation we say that a continuous function is of class C°. 

Thinking of C r as the set of functions of class C r , we have the regularity hier- 
archy 

C° D C 1 D ■ ■ ■ D n c r = C°°. 

rGN 

Each inclusion C r D C r+1 is proper. There exist continuous functions that are not 
of class C 1 , C 1 functions that are not of class C 2 , and so on. For example, 


/(*) 

/ 0 ) 

/(®) 


x 


x 

X 


X 

3 


is of class C° but not 
is of class C 1 but not 
is of class C 2 but not 


of class 
of class 
of class 


C\ 

C 2 , 

C 3 , 


Analytic Functions 

A function that can be expressed locally as a convergent power series is analytic. 
More precisely, the function / : (a, b) — > R is analytic if for each x G (a, 6), there exist 
a power series 

a r h r 

and a 5 > 0 such that if \h\ < 6 then the series converges and 

(X) 

f(x + h) = a r h r . 

r = 0 

The concept of series convergence will be discussed further in Section 3 and Chapter 4. 
Among other things we show in Section 2 of Chapter 4 that analytic functions are 
smooth, and if f(x + h) = ^ a r h r then 

f( r \x) — r\a r . 

This gives uniqueness of the power series expression of a function: if two power 
series express the same function / at x then they have identical coefficients, namely 
f( r \x)/r\. See Exercise 4.38 for a stronger type of uniqueness, namely the identity 
theorem for analytic functions. 

We write C u for the class of analytic functions. 
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A Nonanalytic Smooth Function 

The fact that smooth functions need not be analytic is somewhat surprising; i.e., 
C u is a proper subset of C°°. A standard example is 

e~ l ! x if x > 0 

0 if x < 0. 

Its smoothness is left as an exercise in the use of L’Hopital’s Rule and induction, 
Exercise 17. At x — 0 the graph of e(x) is infinitely tangent to the x-axis. Every 
derivative e^ r ^(0) = 0. See Figure 65. 

0.4 

0.35 

0.3 

0.25 

0.2 

0.15 

0.1 

0.05 

0 

0 0.2 0.4 0.6 0.8 1 

Figure 65 The graph of e(x) — e~ l ^ x 

It follows that e(x) is not analytic. For if it were then it could be expressed near 
x — 0 as a convergent series e,(h) — a rh r , and a r — )/r!. Thus a r — 0 for each 

r, and the series converges to zero, whereas e(h) is different from zero when h > 0. 
Although not analytic at x — 0, e(x) is analytic elsewhere. See also Exercise 4.37. 




Taylor Approximation 


The r th -order Taylor polynomial of an r th -order differentiable function / at x 


is 


P(h) = /(x) + f(x)h + 


f"(x) 1 2 , , f {r \x) 


2 ! 


h + . . . + 


r! 


h r = 


k = 0 


k\ 
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The coefficients f( k \x)/k\ are constants, the variable is h. Differentiation of P with 
respect to h at h — 0 gives 


m 

= /(*) 

P’( o) 

= f{x) 

P (r) (0) 

• • • 

= P\x). 


12 Taylor Approximation Theorem Assume that / : (a, 6) — > R. is r th order 
differentiable at x. Then 

(a) P approximates f to order r at x in the sense that the Taylor remainder 

R(h ) = f{x + h) — P{h ) 


is r th order flat at h — 0; i.e., R{h)/h r -GO as h -G 0. 

(b) The Taylor polynomial is the only polynomial of degree < r with this approxi- 
mation property. 

(c) If, in addition, f is (r+l) st - order differentiable on (a, b) then for some 6 between 
x and x + h we have 

= t ftr+1 

' (r + l)l ' 

Remark (c) is the Lagrange form of the remainder. If |y( r + 1 )(0)| < M for all 
6 G (a, b) then 

Mh r+1 

^ ^ - (r + l)!’ 

an estimate that is valid uniformly with respect to x and x + h in (a, 6), whereas (a) is 
only an infinitesimal pointwise estimate. Of course (c) requires stronger hypotheses 
than (a). 

Proof (a) The first r derivatives of R(h) exist and equal 0 at h — 0. If h > 0 then 
repeated applications of the Mean Value Theorem give 

R(h) = R(h) - 0 = RflO^h = (R'(9 1) - 0 )h = R // (6 > 2 )6 > ih 

= ••• = i2( r_1 )(0 r _i)0 r _2...0i/l 


where 0 < 9 r -\ < • • • < 6\ < h. Thus 


R(h ) 


/i r 



R < ' r ~ 1 \0 r -i)6 r -2 ■ ■ ■ 0\h 


h t 


< 


R( r ~ l \e r -i)- o 


o 


r—1 


o 
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as h — >> 0. If h < 0 the same is true with h < 6\ < • • • < 6 r -\ < 0. 

(b) If Q{h) is a polynomial of degree < r, Q P, then Q — P is not r th -order 
flat at h — 0, so f(x + h) — Q{h ) cannot be r th -order flat either. 

(c) Fix h ^ > 0 and dehne 

sV) = f(x + t) - P(i) - = P(i) - 

for 0 < t < h. Note that since P(£) is a polynomial of degree r, p( r+1 )(£) = 0 for all 
£, and 

R(h) 


g^ r+1 \t) — / (r+1) (x + t) — (r + 1)! 


h r+1 ' 


Also, g( 0) — ^(O) = • • • = g( r \ 0) — 0, and g(h) — R{h ) — i?(/i) — 0. Since g — 0 
at 0 and h, the Mean Value Theorem gives a t\ G (0,/i) such that ^(ti) = 0. Since 
5 7 (0) and ^(ti) = 0, the Mean Value Theorem gives a £2 €= (0, £1) such that ^(£2) — 0. 
Continuing, we get a sequence £1 > £2 > • ■ ■ > £ r +i > 0 such that g^ k \tk) — 0. The 
(r + l) st equation, g( r+1 )(£ r+ i) = 0, implies that 


0 = f <yT+l \x + t r+ 1) - (r + 1)! 


R(h) 

/i r+1 ' 


Thus, 6 — x + £ r +i makes the equation in (c) true. If h < 0 the argument is 
symmetric. □ 

13 Corollary For each r G N the smooth nonanalytic function e(x) satisfies lim c{h)/h r 

h—tO 

= 0 . 


Proof Obvious from the theorem and the fact that e^ r ^(0) = 0 for all r. 


□ 


The Taylor series at x of a smooth function / is the infinite Taylor polynomial 


(X) 


T{h) = £ 

r=0 



In calculus, you compute the Taylor series of functions such as sinx, arctanx, e z , 
etc. These functions are analytic: Their Taylor series converge and express them as 
power series. In general, however, the Taylor series of a smooth function need not 
converge to the function, and in fact it may fail to converge at all. The function 
e(x) is an example of the first phenomenon. Its Taylor series at x — 0 converges, but 
gives the wrong answer. Examples of divergent and totally divergent Taylor series 
are indicated in Exercise 4.37. 
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The convergence of a Taylor series is related to how quickly the r th derivative 
grows (in magnitude) as r — » oo. In Section 6 of Chapter 4 you will find necessary 
and sufficient conditions on the growth rate that determine whether a smooth function 
is analytic. 

Inverse Functions 

A strictly monotone continuous function / : (a, b) -G R. bijects (a, b ) onto some 
interval (c, d) where c = lim^ a f(t) and d — li m t ^b m the increasing case. (We 
permit c or d to be infinite.) It is a homeomorphism (a, b) — > (c, d) and its inverse 
function / -1 : (c, d) (a, fr) is also a homeomorphism. These facts were proved in 
Chapter 2. 

Does differentiability of / imply differentiability of / -1 ? If / 7 0 the answer is 

“yes.” Keep in mind, however, the function / : It shows that differentiability 

of / -1 fails when f ( x ) = 0. For the inverse function is y y 1 / 3 , which is not 
differentiable at y — 0. 

14 Inverse Function Theorem in dimension 1 If f : (a, b) — > (c, d) is a differ- 
entiable surjection and f'(x ) is never zero then f is a homeomorphism. Its inverse 
is differentiable and its derivative at y G (c, d) is 

-7<rf - 

Proof If f is never zero then by the intermediate value property of derivatives, it 
is either always positive or always negative. We assume for all x that f'(x) > 0. If 
a < s < t < b then by the Mean Value Theorem there exists 6 G (s, t ) such that 
f(t) — f(s) — f{6){t — s) > 0. Thus / is strictly monotone. Differentiability implies 
continuity, so / is a homeomorphism (a, b) (c, d). To check differentiability of / -1 
at y G (c, d), define 




x ~f 1 (y) an( A Ax = / 1 (y J rAy)—x. 

Then y — f{x) and Ay — f(x + Ax) — fx — A/. Thus 

A/ -1 = /~ 1 Q + Ay) - f~ l {y) ^ A.x = 1 ^ 1 

Ay Ay Ay Ay/Ax Af/Ax 


Since / is a homeomorphism, Ax -G 0 if and only if Ay — > 0, so the limit of A/ X /Ay 
exists as Ay — > 0 and equals l// / (x) = l// 7 o / _1 (y). □ 
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A longer but more geometric proof of the one-dimensional inverse function theo- 
rem can be done in two steps. 

(i) A function is differentiable if and only if its graph is differentiable. 

(ii) The graph of / -1 is the reflection of the graph of / across the diagonal, and is 
thus differentiable. 

See Figure 66. 



Figure 66 A picture proof of the inverse function theorem in R 

If a homeomorphism / and its inverse are both of class C r , r > 1, then / is a C r 

diffeomorphism. 

15 Corollary If f : (a, b) — > (c, d) is a homeomorphism of class C r , 1 < r < oo, and 
for all x G (a, b ) we have f\x) ^ 0 then f is a C r diffeomorphism. 


Proof If r = 1, the formula (/ _1 ) / (y) = 1 / f r ° / -1 (y) implies that ( / ~ 1 )'(y) is 
continuous, so / is a C 1 diffeomorphism. The Rules of Differentiation and induction 
on r > 2 complete the proof. □ 


164 


Functions of a Real Variable 


Chapter 3 


The corollary remains true for analytic functions - the inverse of an analytic 
function with nonvanishing derivative is analytic. The generalization of the inverse 
function theorem to higher dimensions is a principal goal of Chapter 5. 


2 Riemann Integration 

Let / : [a, b] — > M be given. Intuitively, the integral of / is the area under its graph; 
i.e., for / > 0 we have 

b 

/(x) dx = area'll 

where 11 is the undergraph of/, 



U = {(x, y) : a < x < b and 0 < y < /(x)}. 


The precise definition involves approximation. A partition pair consists of two finite 
sets of points P, T C [a, b } where P — {xo, . . . , x n } and T — {p, . . . , t n } are interlaced 
as 

a = xo < t\ <x\ < p < X 2 < • • • < t n < x n = b. 

We assume the points xo, . . . , x n are distinct. The Riemann sum corresponding to 
/, P, T is 


n 


R(f, P, T) = y f(ti)Axi = f{ti)Axi + f(t 2 ) Ax 2 + . . . + f(t n )Ax n 


i = 1 

where Ax/ = x/ — x/_i. The Riemann sum P is the area of rectangles which approxi- 
mate the area under the graph of /. See Figure 67. Think of the points U as sample 
points. We sample the value of the function / at t{. 

The mesh of the partition P is the length of the largest subinterval [x/_i,x/]. A 
partition with large mesh is coarse; one with small mesh is fine. In general, the finer 
the better. A real number I is the Riemann integral of / over [a, b] if it satisfies 
the following approximation condition: 

Ve > 0 3 S > 0 such that if P, T is any partition pair then 


mesh P < 5 


R — I < e 


where R — R(f , P, T). If such an I exists it is unique, we denote it as 

x 


f f(x) dx = I = lim R(f, P, T) 

J a meshP— ^0 


and we say that / is Riemann integrable with Riemann integral I. 
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a 



x. 


b 


Figure 67 The area of the strip is /(^)Ax^. 



16 Theorem If f is Riemann integrable then it is bounded. 


Proof Suppose not. Let I — f(x ) dx. There is some S > 0 such that for all 
partition pairs with meshP < 5, we have \R — I\ < 1. Fix such a partition pair 
P — {xo, . . . , x n }, T — {p, . . . , t n }. If / is unbounded on [a, b] then there is also a 
subinterval [x^ 0 _i, Xi 0 ] on which it is unbounded. Choose a new set 1* = . . . , t' n } 

with t'- — ti for all i ^ in and choose t'- such that 

L ' i 0 


/( 4 ) “ fXo)\^ x i 0 > 2 - 


This is possible since the supremum of {|/(t)| : x^ 0 _i < t < Xi 0 } is oo. Let R f = 
P(/, P, T 7 ). Then |P — P'| > 2, contrary to the fact that both R and R ' differ from 
I by < 1. □ 


Let tk denote the set of all functions that are Riemann integrable over [a, b ]. 


17 Theorem (Linearity of the Integral) 

(a) tk is a vector space and f f(x) dx is a linear map tk — > M. 

(b) The constant function h(x) — k is integrable and its integral is k(b — a). 
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Proof (a) Riemann sums behave naturally under linear combination: 

R(f + eg, P, T) = R(f, P, T) + cR(g , P, T), 
and it follows that their limits, as meshP — > 0, give the expected formula 


rb ro 

/ f(pc) + cg{x) dx — f(x) dx + c I g(x) dx. 

J a J a J a 

(b) Every Riemann sum for the constant function h(x) — k is k(b — a), so its integral 
equals this number too. □ 

18 Theorem (Monotonicity of the Integral) ///, g G and f < g then 




x 


nb rb 

/ f{x)dx < / g(x)dx. 

J a J a 


Proof For each partition pair P, T, we have P(/, P, T) < R(g , P, T). 


□ 


19 Corollary If f G and \ f\ < M then \ f b f{x) dx \ < M(b — a). 


Proof By Theorem 17, the constant functions ±M are integrable. By Theorem 18. 
—M < /(x) < M implies that 


M(b — a) < / f{x)dx < M(b — a). 

J a 


□ 


Darboux Integrability 

The lower sum and upper sum of a function / : [a, 6] 
to a partition P of [a, 6] are 


— M, M] with respect 


n 


n 


L(f,P) = E niiAxi and [/(/, P) = Mi Ax , 


i=l 


i=l 


where 


m; = inf{/(t) : < t < x*} M, = sup{/(t) : Xj_i < t < xj. 


We assume / is bounded in order to be sure that rrq and Mi are real numbers. Clearly 

L(/,P) < R(f,P,T) < U(f,P) 
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Figure 68 The upper sum, a Riemann sum, and the lower sum 


for all partition pairs P, T. See Figure 68. 


The lower integral and upper integral of / over [a, b } are 


/ = sup L(/,P) and I — inf [/(/, P). 

p p 


P ranges over all partitions of [a, b] when we take the supremum and inhmum. If the 
lower and upper integrals of / are equal, / = /, then / is Darboux integrable and 
their common value is its Darboux integral. 


20 Theorem Riemann integrability is equivalent to Darboux integrability , and when 
a function is integrable, its three integrals - lower, upper, and Riemann - are equal. 
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To prove Theorem 20 it is convenient to refine a partition P by adding more 
partition points. The partition P f refines P if P' D P. 

Suppose first that P — P U {w} where w G (xi 0 -i,Xi 0 ). The l° wer sums for 
P and P' are the same except that rrii 0 Axi 0 in P(/, P) splits into two terms in 
P(/, P'). The sum of the two terms is at least as large as rrii 0 Axi 0 . For the inhmum 


of / over the intervals [xi 0 -i,w\ and 
P(/,P') < P(/,P). See Figure 69. 


re, Xi 0 


is at least as large as m^. Similarly, 


Repetition continues the pattern and we formalize it as the 

Refinement Principle Refining a partition causes the lower sum to increase and 
the upper sum to decrease. 



lower 

summand 


refined lower 
summand 



refined upper 
summand 



Figure 69 Refinement increases L and decreases U. 


The common refinement P* of two partitions P, P r of [a, b] is 


P*=P U P’. 


According to the Refinement Principle we have 

L{f, P) < L(f, P*) < U(f,P*) < U (/, P'). 

We conclude that each lower sum is less than or equal to each upper sum, the lower 
integral is less than or equal to the upper, and thus 


(2) 


A bounded function / : [a, b] — > R is Darboux integrable 
if and only if Ve > 0 3P such that U (/, P) — P(/, P) < e. 
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Proof of Theorem 19 Let / : [a, b] — > M. We assert that / is Riemann integrable 
if and only if it is Darboux integrable. One direction is easy: Riemann => Darboux. 
Riemann integrability implies that / is bounded and that for each e > 0 there exists 
a S > 0 such that if P is any partition with mesh P < S then 


R — I < 


e 

4 


where R — R(f, P, T) and I is the Riemann integral of /. Fix such a partition P and 
choose a set of sample points T — {p} such that f{t%) is so near rrq that 

R(f,P,T)-L(f,P) < 


(It is enough to choose p G [ X { _ 
second set of sample points T' — 


i ,Xi\ such that /(p) 
{t[} so that 


rrii < e/4 (6 — a).) Choose a 


U(f,P) — R(f,P,T') < 


Both R = i?(/, P, T ) and R ’ — R(f , P, T 7 ) differ from / by < e/4. Thus 


P-P = (U-R')P(R' -I) + (I-R) + (R-L) < e, 


from which (2) gives Darboux integrability. Since /, /, I are fixed numbers that 
belong to the interval [L, P] of length e, and e is arbitrary, the e-principle implies 
that 

1 = 1 = I, 


which completes the proof that / is Darboux integrable and that the lower, upper, 
and Riemann integrals are equal. 


Next, we assume Darboux integrability and prove Riemann integrability. (The 
proof is messier because checking Riemann integrability requires that we look at all 
fine partitions P, not just at those for which P — L is small.) Darboux integrability 
implies that / is bounded, say / : [a, b] — > [— M, M], By (2) we know that for each 
e > 0 there is a partition Pi such that 



L i < 


6 

3 


where Pi = P(/,Pi) and U x = P(/,Pi). Fix 


6niAP 


where ni is the number of Pi-intervals, and consider a partition P with meshP < S. 
Fix a set T of sample points for P. We claim that the Riemann sum P(/, P, T) for 
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every such partition pair P, T differs from the Darboux integral I by less than e. 
Then, by the e-principle, / is Riemann integrable and its Riemann integral is I. 

According to the Refinement Principle, the common refinement P* = P\ U P has 


Pi < P* < IT < Ui 

where P* = P(/, P*) and P* = P(/, P*). Hence P* - P* < e/3. 

Write P = {x^} and P* = {x*} for 0 < i < n and 0 < j < n*. Since P* refines P 
by adjoining Pi to P, we have 


* 


* 


n < n < n + ni. 

There are only ni + 1 points of Pi, two of which are the endpoints a and 6, which leaves 
only rii — 1 points of P\ that might “contaminate” P-intervals. See Figure 70. Except 



* 

3 + 1 


Figure 70 [xk-i,Xk\ is both a P- and a P*-interval. The point x* belongs 
to P* \ P and “contaminates” the P-interval [x^_i,x^], splitting it into 


* 


Xi — 1 5 X j 


and 


j ) 


. Only a few P-intervals get contaminated. 


for these contaminated P-intervals, each of length < S , the sums P = ^ MiAxi and 


P* = ^ Mf Ax* are identical. Their difference is the sum over the contaminated 


-* 




3 


J 


P-intervals, of which there are fewer than n\. The contaminated differences Mi — M- 
and Mi — M* +1 are at worst 2 M in magnitude. Thus 


* 


P-P* < 2MmS = -. 


Similarly, P* — P < e/3. Thus 
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Since I and R — R(f , P, T ) both belong to [L, f7], an interval of length < e, we get 
P-/ < e. Therefore / is Riemann integrable and its Riemann integral equals the 
Darboux integral I. □ 


According to Theorem 20 and (2) we get 

21 Riemann’s Integrability Criterion 


A bounded function is Riemann integrable if and only if 
Ve > 0 3P such that U(f, P) - L(f , P) < e. 


Example Every continuous function / : [a, b] — > R is Riemann integrable. (See 
also Corollary 24 to the Riemann-Lebesgue Theorem, below.) Since [a, 6] is compact 
and / is continuous, / is uniformly continuous. See Theorem 42 in Chapter 2. Let 
e > 0 be given. Uniform continuity provides a 5 > 0 such that if \t — s\ <5 then 
m-m\ < e/2 (6 — a). Choose any partition P with meshP < S. On each partition 
interval [xi- 1 , Xif we have Mi — mi < e/{b — a). Thus 


n 


U — L — Mi — mi)Axi < 


i — 1 


(6 — a) 


T ^ X i 


— 6 . 


By Riemann’s Integrability Criterion / is Riemann integrable. 


Example The characteristic function (or indicator function) of a set E C 

R, Xe, takes value 1 at points of E and value 0 at points of E c . See Figure 71. 
Some characteristic functions are Riemann integrable, while others aren’t. Riemann’s 




Figure 71 The region below the graph of a characteristic function 
Integrability Criterion implies that the characteristic function of an interval (including 
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the degenerate case that the interval is a point) is Riemann integrable. The integral 
of X[ a ,b] is b — a. A step function is a finite sum of constants times characteristic 
functions of intervals and is therefore Riemann integrable. A step function is a special 
type of piecewise continuous function, i.e., a function that is continuous except 
at finitely many points. See Figure 72. Bounded piecewise continuous functions are 
Riemann integrable. See Corollary 25 below. 



* * > — — — * 

Figure 72 The graphs of a piecewise continuous function and a step 

function. 

Example The characteristic function of Q is not integrable on [a, b\. It is defined as 
Xq(x) = 1 when x G Q and Xq(x) — 0 when x ^ Q. See Figure 73. Every lower sum 


a b a b 

Figure 73 The graph of Xq and the region below its graph 

L(Xq, P) is 0 and every upper sum is b — a. By Riemann’s Integrability Criterion, 
Xq is not integrable. Note that Xq is discontinuous at every point, not merely at 
rational points. 
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The fact that Xq fails to be Riemann integrable is actually a failing of Riemann 
integration theory, for the function Xq is fairly tame. Its integral ought to exist and 
it ought to be 0, because the undergraph is just countably many line segments of 
height 1, and their area ought to be 0. 

A handy consequence of Riemann’s Integrability Criterion is the 


22 Sandwich Principle / : [a, b] - 

are functions g,h <E such that g < / < h and f° h(x) — g(x ) dx < e. 


M is Riemann integrable if, given e > 0 ; there 
b 
a 


Proof For any partition P it is clear that 


L(g,P ) < L(f, P) < U(f,P) < U(h,P). 

Let e > 0 be given. Since g and h are Riemann integrable, there is a S > 0 such 
that if meshP < S then their Darboux sums differ from their integrals by < e/3, and 
f b h(x) — g(x) dx < e/3. Thus 


fb e fb e 

/ g(x) dx — L(g , P) < - and U (/i, P) — h(x) dx < - 
Ja 3 J a 3 


from which it follows that 


f g(x)dx- e -< L(g, P) < L(f, P) < U(f, P) < U(h, P) < f h(x) dx + ^ . 

J a J cl 

Then J b h(x) dx — J b g(x) dx — J b h(x) — g(x) dx < e/3 gives U (/, P ) — L(/, P) < e 
and Riemann’s Integrability Criterion implies that / is Riemann integrable. See 
Figure 74. 



Figure 74 The graphs of g and h sandwich the graph of /. 


Example Let / : [0, 1] — > Q be defined as f(p/q) — l/q when p/q G Q is written in 
lowest terms, and f(x) = 0 when x is irrational. This is the rational ruler function. 
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Figure 75 The graph of the rational ruler function and the region below 

its graph 

Note that / is discontinuous at every x £ Q and is continuous at every x £ Q c . See 
Figure 75. It is Riemann integrable and its integral is zero. For, given e > 0, we can 
consider the degenerate step function 

f 1/q if p/q G Q H [0, 1] and l/q> e 
s(x) = < 

[0 otherwise. 

Then / is sandwiched between the Riemann integrable functions g — 0 and 

h(x) = eX[o,i](x) + s(x). 

The integral of h — g is e, so the Sandwich Principle implies that / G 

Example Zeno’s staircase function Z(pc) — 1/2 on the first half of [0, 1], Z(x) — 
3/4 on the next quarter of [0, 1], and so on. See Figure 76. It is Riemann integrable 
and its integral is 2/3. The function has infinitely many discontinuity points, one at 
each point (2 k — l)/2 k . In fact, every monotone function is Riemann integrable. ^ See 
Corollary 26 below. 


^To prove this directly is not hard. The key observation to make is that a monotone function is not 
much different from a continuous function. It has only jump discontinuities, and only countably many 
of them; given any e > 0, there are only finitely many at which the jump is > e. See Exercise 1.31. 
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These examples raise a natural question: 


Exactly which functions are Riemann integrable^ 


To give an answer to the question, and for many other applications, the following 
concept is very useful. A set Z C R is a zero set if for each e > 0 there is a 
countable covering of Z by open intervals (cq,^) such that 


(X) 



i— 1 


The sum of the series is the total length of the covering. Think of zero sets as 
negligible. If a property holds for all points except those in a zero set then one says 
that the property holds almost everywhere, abbreviated “a.e.” 


23 Riemann-Lebesgue Theorem A function f : [a, b] — >► R is Riemann integrable 
if and only if it is bounded and its set of discontinuity points is a zero set. 


The set D of discontinuity points is exactly what its name implies. 


D = {x G [a, b] : / is discontinuous at the point x}. 


A function whose set of discontinuity points is a zero set is continuous almost every- 
where. The Riemann-Lebesgue Theorem states that a function is Riemann integrable 
if and only if it is bounded and continuous almost everywhere. 
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Examples of zero sets are 

(a) Every subset of a zero set. 

(b) Every finite set. 

(c) Every countable union of zero sets. 

(d) Every countable set. 

(e) The middle-thirds Cantor set. 

(a) is clear. For if Zo C Z where Z is a zero set, and if e > 0 is given, then 
there is some open covering of Z by intervals whose total length is < e; but the same 
collection of intervals covers Zo, which shows that Zq is also a zero set. 

(b) Let Z = {z i, . . . , z n } be a finite set and let e > 0 be given. The intervals 
(zi — e/2n , Zi + e/2 n), for i — 1, . . . , n, cover Z and have total length e. Therefore 
Z is a zero set. In particular, the empty set and any single point are zero sets. 

(c) This is a typical a e/2 n -argument.” Let Zi, Z 2 , . . . be a sequence of zero sets 

and Z = U We claim that Z is a zero set. Let e 0 be given. The set Z\ can be 
covered by countably many intervals (an, bn) with total length — a n) — 6 /2- 

The set Z 2 can be covered by countably many intervals (^ 2 ,^ 2 ) with total length 

— a^) < e/4. In general, the set Zj can be covered by countably many intervals 
( a ij,bij) with total length 

(X) 

— a b‘) — 2j • 

2=1 

Since the countable union of countable sets is countable, the collection of all the 
intervals (ayp bij) is a countable covering of Z by open intervals, and the total length 
of all these intervals is 



(X) 


E C 6 6 6 

r — — -}- — V — V • • • 

2 3 9 A « 


3 = 1 


4 8 


= e. 


Thus Z is a zero set and (c) is proved. 

(d) This is implied by (b) and (c). 

(e) Let e > 0 be given and choose n G N such that 2 n /3 n < e. The middle-thirds 
Cantor set C is contained inside 2 n closed intervals of length l/3 n , say I \, . . . ,/ 2 ^. 
Enlarge each closed interval I{ to an open interval (an b{) D I\ such that bi — ai — e/2 n . 
(Since l/3 n < e/2 n , and I{ has length l/3 n , this is possible.) The total length of these 
2 n intervals (cq, b\) is e. Thus C is a zero set. 

It is nontrivial to prove that intervals are not zero sets. See Exercise 29. 
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In the proof of the Riemann-Lebesgue Theorem, it is useful to focus on the “size 
of a discontinuity. A simple expression for this size is the oscillation of / at x, 


osc x (f) = limsup f(t) - liminf f(t). 

t^x 


Equivalently, 


osc x(f) = lim diam f([x — r, x + r] ) . 

(Of course, r > 0.) It is clear that / is continuous at x if and only if osc x (f) — 0. It 
is also clear that if I is any interval containing x in its interior then 


M/-m/ > osc x (f) 

where Mj and mj are the supremum and inhmum of f(t) as t varies in I. See 
Figure 77. 



Figure 77 The oscillation of / at x 


Proof of the Riemann-Lebesgue Theorem The set D of discontinuity points of 
/ : [a, b] — > [— M, M] naturally filters itself as the countable union 


(X) 


where 


D=\JD k 

k= 1 


D k = {x G [a, b } : osc x (f) > l/k} 


According to (a), (c) above, D is a zero set if and only if each D k is a zero set. 

Assume that / is Riemann integrable and let e > 0 and k G N be given. By 
Theorem 20 there is a partition P such that 

U — L — Mi — rrii)Axi < e/k. 
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We say that a P-interval A is “bad” if it contains a point of D & in its interior. 
A bad interval has a fairly big /-variation, namely Mi — mi > 1/k. Since U — L — 
— rrii)Axi < e/k is small, there cannot be too many bad intervals. ( This is the 
key insight in the estimates.) More precisely, 

v > U - L = y^jMi - mi)Axi > - m*)Axi > - ^ Ax{ 


bad 


k 


bad 


implies (by canceling the factor 1/k from both sides of the inequality) the sum of 
the lengths of the bad intervals is < e. Thus, except for the finite set D & n P, D & is 
contained in finitely many open intervals whose total length is < e. Since finite sets 
are zero sets and e is arbitrary, each D & is a zero set. Therefore D — \J D & is a zero 
set. 


Conversely, assume that the discontinuity set D of / : [a, b\ — > [— M, M] is a zero 
set. Let e > 0 be given. By Riemann’s Integrability Criterion, to prove that / is 
Riemann integrable it suffices to find P with [/(/, P) — L(/, P) < e. Choose k G N 
so that 

1 e 

k 2{b — a) 

Since D is a zero set, so is D & and hence there is a countable covering $ of D & by 
open intervals Jj = (op bj) with total length 


V b - < 


4M 


These Jj are “bad” intervals: The /-variation on each Jj is > 1/fc. On the other 
hand, for each x G [a, 6] \ D & there is an open interval I x containing x such that 

sup{/(£) : t E 4} - inf{/(t) : t E 4} < 1/fc. 

These intervals I x are a covering d of the good set [a, b] \ D The union V = d U $ 
is an open covering of [a, 6]. Compactness of [a, 6] implies that V has a Lebesgue 
number A > 0. 

Let P = {xo, . . . , x n } be any partition of [a, b] having mesh P < A. We claim that 
U{f, P) — L(f, P) < e. Each P-interval I{ is contained wholly in some I x or wholly 
in some Jj. (This is what Lebesgue numbers are good for.) Set 

J = {i E {1, . . . , n} : I{ is contained in some bad interval Jj}. 


See Figure 78. For some finite m, J\ U • • • U J m contains those P-intervals Ij with 
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Figure 78 The P-intervals R with large oscillation have i G J and are 

potentially “bad.” 


i G J. Then 


U-L = 


< 


< 

< 


n 


- mi) A 


Xi 


1=1 


^(Mi - mi)Axi + - mi)Axi 

iE J 

2 M Axi + / k 

iEJ i^J 


2 M bj — dj + (b — a) jk 

3 = 1 


e e 



For the total length of the P-intervals R contained in the bad intervals Ji, . . . , J m 
is no greater than ^ bj — aj. As remarked at the outset, Riemann’s Integrability 
Criterion then implies that / is integrable. □ 


The Riemann-Lebesgue Theorem has many consequences, ten of which we list as 
corollaries. 

24 Corollary Every continuous function is Riemann integrable , and so is every 
bounded piecewise continuous function. 
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Proof The discontinuity set of a continuous function is empty, and is therefore a 
zero set. The discontinuity set of a piecewise continuous function is finite, and is 
therefore also a zero set. A continuous function defined on a compact interval [a, b } 
is bounded. The piecewise continuous function was assumed to be bounded. By the 
Riemann-Lebesgue Theorem, both these functions are Riemann integrable. □ 


25 Corollary The characteristic function of S C [a, b] is Riemann integrable if and 
only if the boundary of S is a zero set . 


Proof dS is the discontinuity set of Xs- See also Exercise 5.44 


□ 


26 Corollary Every monotone function is Riemann integrable. 


Proof The set of discontinuities of a monotone function / : [a, b] — > R is countable 
and therefore is a zero set. (See Exercise 1.31.) Since / is monotone, its values lie 
in the interval between /(a) and /(b), so / is bounded. By the Riemann-Lebesgue 
Theorem, / is Riemann integrable. □ 


27 Corollary The product of Riemann integrable functions is Riemann integrable. 


Proof Let /, g G tk be given. They are bounded and their product is bounded. By 
the Riemann-Lebesgue Theorem their discontinuity sets, D(f) and D(g ), are zero 
sets, and D(f) U D(g) contains the discontinuity set of the product / • g. Since the 
union of two zero sets is a zero set, the Riemann-Lebesgue Theorem implies that / • g 
is Riemann integrable. □ 


28 Corollary If f : [a, b] — > [c, d] is Riemann integrable and <f> : 
continuous , then the composite <f> o / is Riemann integrable. 


c, d\ — > R is 


Proof The discontinuity set of <f> o / is contained in the discontinuity set of /, and 
therefore is a zero set. Since <f> is continuous and [c, d\ is compact, <f> o f is bounded. 
By the Riemann-Lebesgue Theorem, <f> o f is Riemann integrable. □ 


29 Corollary If f G $ then \ f\ G fR. 


Proof The function <f> : y \y\ is continuous, so x 4 | f(x) 
according to Corollary 28. 


is Riemann integrable 

□ 


30 Corollary If a < c < b and f : [a, b] — > R is Riemann integrable then its restric 
tions to [a, c] and [c, b] are Riemann integrable and 


Section 2 


Riemann Integration 


181 


Conversely, Riemann integrability on [a, c] and [c, b } implies Riemann integrability on 
[a, b\. 


Proof See Figure 79. The union of the discontinuity sets for the restrictions of / to 



Figure 79 Additivity of the integral is equivalent to additivity of area. 


the subintervals [a, c], [c, b] is the discontinuity set of /. The latter is a zero set if and 
only if the former two are, and so by the Riemann-Lebesgue Theorem, / is Riemann 
integrable if and only if its restrictions to [a, c] and [c, b] are. 

Let X[ a ,c] an d X[ Cl b] be the characteristic functions of [a, c] and [c, b\. By Corol- 
lary 24 they are integrable, and by Corollary 27, so are the products X[ a , c ] ' / an d 
X[c,b] ■ /• Since 

f 'V [o,,c] ‘ f S - X(c,fe] ’ f 


the addition formula follows from linearity of the integral, Theorem 17. 


□ 


31 Corollary If f : [a, b] — > [0,M] is Riemann integrable and has integral zero then 
f{x) — 0 at every continuity point x of f. Thus /(x) — 0 almost everywhere. 

Proof Suppose not: Let xo be a continuity point of / and assume that /(x o) > 0. 
Then for some 5 > 0 and each x G (xo — 5, x o + S ) we have /(x) > /(x o)/2. The 
function 

r m o) 

g(x) = <( 2 

0 otherwise 


if x G (xq — 5 , xq + d) 


satisfies 0 < g(x) < /(x) everywhere. See Figure 80. By monotonicity of the integral. 
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a Xq- 6 ,vq - 5 

Figure 80 The shaded rectangle prevents the integral of / being zero. 


Theorem 18, we have 

rb pb 

f(xo)6= / g(x)dx< / f{pc)dx = 0, 

J a J a 

a contradiction. Hence /(x) = 0 at every continuity point. □ 


Corollary 28 and Exercises 33, 35, 47, 49 deal with the way that Riemann inte- 
grability behaves under composition. If / E 3? and f is continuous then f o / E 3i, 
although the composition in the other order, / o 0, may fail to be integrable. Con- 
tinuity is too weak a hypothesis for such a “change of variable.” See Exercise 35. In 
particular, the composite of Riemann integrable functions may fail to be Riemann 
integrable. See Exercise 33. However, we have the following result. 

32 Corollary If f is Riemann integrable and if is a homeomorphism whose inverse 
satisfies a Lipschitz condition then f o if is Riemann integrable. 


Proof More precisely, we assume that / : [a, b] — > R is Riemann integrable, if bijects 
c, d] onto [a, 6], if(c) — a, if{d) — 5, and for some constant K and all s,t E [a, b] we 
have 

r f~ 1 {s)—fi~ 1 {t) | < K\s — t. 


We then assert that / o if is a Riemann integrable function [c, d\ — > R. 


Let D be the set of discontinuity points of /. Then D' — if^i^D) is the set of 
discontinuity points of / o if. Let e > 0 be given. There is an open covering of 
D by intervals (a^,5^) whose total length is < e/K. The homeomorphic intervals 
(af b[) — '0 _1 (ai, b{) cover D' and have total length 


^ Yi K (bi-ai ) < e. 


Therefore D' is a zero set and by the Riemann-Lebesgue Theorem, foip is integrable. □ 
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33 Corollary If f G D? and if : [c, d\ 
Riemann integrable. 


a, 6] is a C 1 diffeomorphism then f o if is 


Proof The hypothesis that if is a C 1 diffeomorphism means that it is a continuously 
differentiable homeomorphism whose inverse is also continuously differentiable. By 
the Mean Value Theorem, for all s, t G [a, b } we have 


if 1 (s) — if i (t) I < K \s — t\ 


l 


where K = max (if 1 ) / (x)|. By Corollary 32, / o if is Riemann integrable. 

xG[a,b] 


□ 


Versions of the preceding theorem and corollary remain true without the hy- 
potheses that if bijects. The proofs are harder because if can fold infinitely often. 
See Exercises 42 and 44. 

In calculus you learn that the derivative of the integral is the integrand. This we 
now prove. 


34 Fundamental Theorem of Calculus If f : 

then its indefinite integral 

rx 


a, b\ -G M is Riemann integrable 


F{x) = / f{t) dt 

J a 


is a continuous function of x. The derivative of F(x) exists and equals f(x) at every 
point x at which f is continuous. 


Proof ffl Obvious from Figure 81. 


□ 



Figure 81 Why does this picture give a proof of the Fundamental Theorem 

of Calculus? 
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Proof Since / is Riemann integrable, it is bounded; say \f(x)\ < M for all x. 
By Corollary 30 we have 


F(y) - F(x)\ = 


'V 


fit) dt 


X 


< M 


y-x 


Therefore F is continuous. Given e > 0, choose 5 < e/M, and observe that 
\y — x\ < S implies that | F(y) — F(x) \ < MS < e. In exactly the same way, if / is 
continuous at x then 


F(x + h) — F{x) 1 


»x+/i 


h 


h 


f{t)dt ->■ f{x) 


X 


as h — > 0. For if 


m(x, h) — inf{/(s) : 


s — x 


< \h\} M(X)h) — sup{/(s) : 


s — X 


< W} 


then 


1 [ x+h 1 f 

m(x, h) = — / m(x, h) dt < — / 

h J x h J X 


x+h 


fit) dt 


X XX 

2 rx+h 

< — j M(x,h)dt — M{x,h). 


X 


When / is continuous at x, m(x, h) and M(x, h) converge to f(x) as h 
must the integral sandwiched between them, 


0, and so 




h 


fit) dt ->• /(x). 




(If h < 0 then ^ /(£) dt is interpreted as — ^ f(t) dt.) 


l 


□ 


35 Corollary The derivative of an indefinite Riemann integral exists almost every- 
where and equals the integrand almost everywhere. 


Proof Assume that / : [a, b] — > R is Riemann integrable and F(x) is its indehnite 
integral. By the Riemann-Lebesgue Theorem, / is continuous almost everywhere, 
and by the Fundamental Theorem of Calculus, F'{x) exists and equals f{pc) wherever 
/ is continuous. □ 


A second version of the Fundamental Theorem of Calculus concerns antideriva- 
tives. If one function is the derivative of another, the second function is an an- 
tiderivative of the first. 
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Note When G is an antiderivative of g : [a, b ] -0- R, we have 

G'(x) = g(x) 

for every x G [a, 5], not merely for almost every x G [a, 5]. 


36 Corollary Every continuous function has an antiderivative. 


Proof Assume that / : [a, b] -G R is continuous. By the Fundamental Theorem of 
Calculus, the indefinite integral F{x) has a derivative everywhere, and F'{x) — f(x) 
everywhere. □ 


Some discontinuous functions have an antiderivative and others don’t. Surpris- 
ingly, the wildly oscillating function 


0 


if x < 0 


f{x) = < sin * if z > 0 


X 


has an antiderivative, but the jump function 


g{x) = 


0 if x < 0 

1 if x > 0 


does not. See Exercise 40. 

37 Antiderivative Theorem An antiderivative of a Riemann integrable function , 
if it exists, differs from the indefinite integral by a constant. 


Proof We assume that / : [a, b] — > R is Riemann integrable, that G is an antideriva- 
tive of /, and we assert that for all x G [a, b] we have 


rx 

G(x) — / f{t) dt + C, 

J a 


where C is a constant. (In fact, C — G(a).) Partition [a,x\ as 


a — xq < x\ < . . . < x n — x. 


and choose tk G [xk- uXk\ such that 


G(x k ) - G(x k - 1 ) = G'(t k ) Ax k . 
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Such a tk exists by the Mean Value Theorem applied to the differentiable function 
G. Telescoping gives 

n n 

G(x)-G(a) = y ~^G(x k ) - G(x k - 1 ) = ^ f(t k )Ax k , 

k = 1 k = 1 


which is a Riemann sum for / on the interval [a,x\. Since / is Riemann integrable, 
the Riemann sum converges to F(x) as the mesh of the partition tends to zero. This 
gives G(x) — G(a ) = F(x) as claimed. □ 


38 Corollary Standard integral formulas , such as 


>b 


x 2 dx — 


6 3 — a' 


a 


are valid. 


Proof Every integral formula is actually a derivative formula, and the Antiderivative 
Theorem converts derivative formulas to integral formulas. □ 

Example The logarithm function is defined as the integral, 

f x 1 

log x — / - dt. 

J i t 

Since the integrand 1/t is well defined and continuous when t > 0, logx is well 
defined and differentiable for x > 0. Its derivative is 1/x. By the way, as is standard 
in post-calculus vocabulary, logx refers to the natural logarithm, not to the base-10 
logarithm. See also Exercise 16. 


An antiderivative of / has G'(x) — fix) everywhere, and differs from the indefinite 
integral F{x) by a constant. But what if we assume instead that H'{x) — f(x) 
almost everywhere? Should this not also imply H{x) differs from F{x) by a constant? 
Surprisingly, the answer is “no.” 

37 Theorem There exists a continuous function H : [0, 1] — > R whose derivative 
exists and equals zero almost everywhere , but which is not constant. 

Proof The counterexample is the Devil’s staircase function, also called the Can- 
tor function. Its graph is shown in Figure 82 and it is defined as follows. 

Each x G [0, 1] has a base-3 expansion (.LJ 1 LJ 2 U 3 . . . )3 where 

(X) 

E iVi 

3 ^‘ 

i = 1 


X 
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3 

4 


1 


i 

4 



0 


I 

9 


2 

1 


7 

9 


8 

9 


Figure 82 The Devil’s staircase 


Each Ui is 0, 1, or 2. If x E C, the standard Cantor set constructed in Chapter 2, 
then x has a unique expansion in which each ui equals 0 or 2. The function H sends 
x G C to 

iOi/2 


oo 


H{x) = £ 

i = 1 

H has equal values at the endpoints of the discarded gap intervals and so we extend 
H to them by letting it be constant on each. This accounts for the steps in its graph. 

There are two things to check - the definition of H makes sense and H has the 
properties asserted. Continuity of the map H : C — > [0, 1] is simple. As we showed 
in Chapter 2, C is the nested intersection C n where C n is the disjoint union of 2 n 
intervals of length l/3 n , the endpoints of which are fractions with denominator 3 n . 
Between the intervals C a in C n there are open discarded intervals of length > l/3 n . 
Let e > 0 be given, choose n with l/2 n < e, and take <5 = 1/ 3^ . If x, x' G C have 
x — x'\ < S — l/3 n then they he in a common interval C a in C n . For the distance 
between different intervals C ai Cp in C n is at least l/3 n . Therefore the base-3 
expansion of x and x r agree for the first n terms, which implies | H(x) — H(x f )\ < 
Yl^jLn+i < e and gives continuity on C. 
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At stage n in the Cantor set construction we discard the open middle third of an 
interval C a — [£ ai £ a + l/3 n ], where the left endpoint is 




(• OL\ Ot2 • • • OL n ) 3 - 


and each cq is 0 or 2. Thus the discarded interval is 


{£a + l/3 n+1 , £ a + 2/3 n+1 ) — ((.a 1)3, (.o' 2)3) 


((.O' 02)3, (.02)3) 


since l/3" +1 = 2 / 3 J . This expresses both endpoints base-3 using only the 

numerals 0 and 2. Evaluating H on them gives equal value: 


H{i a + l/T +l ) = H((.a 02)3) 


Q+ 2 . 0 1 y- + 

2^ 2 n+1 ^ 2^ %j 

i= 1 j=n J r 2 


H(£ a + 2/3 n+1 ) = 77((.a2) 3 ) 


n 

a 

~2 * 

1=1 




+ 


1 

071+1 ’ 


This verifies that the definition of H being constant on the discarded intervals makes 
sense and completes the proof that H is continuous on [0, 1]. 

It is clear that H(0) = 0 and 


H( i) = H((. 2)3) = Eir = 1. 

i=l 

Thus FT is surjective. If x,x' G C and x < x f then it is also clear that H{x) < 
H(x'), which implies that H is nondecreasing on [0, 1]. Since H is constant on the 
complement of the Cantor set, its derivative exists and is zero almost everywhere. □ 


A yet more pathological example is a strictly monotone, continuous function J 
whose derivative is almost everywhere zero. Its graph is a sort of Devil’s ski slope, 
almost everywhere level but also everywhere downhill. To construct J, start with H 
and extend it to a function H : R -+ R by setting H(x + n) — H (x) + n for all n G Z 
and all x G [0, 1]. Then set 


j ( x ) = V 

k = 0 


77(3+) 

4 k 


The values of H(3 k x ) for x G [0,1] are < 3 fe , which is much smaller than the de- 
nominator A k . Thus the series converges and J{pc) is well dehned. According to 
the Weierstrass M-test, proved in the next chapter, J is continuous. Since H(3 k x ) 
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strictly increases for any pair of points at distance > l/3 k apart, and this fact is 
preserved when we take sums, J strictly increases. The proof that J'(x) — 0 almost 
everywhere requires deeper theory. See Exercise 48 on page 456. 

Next, we justify two common methods of integration. 

38 Integration by Substitution // / G X and g : [c, d\ — > [a, b] is a continuously 
differentiable bijection with g' > 0 (g is a C 1 diffeomorphism) then 

rb rd 

/ f(y)dy = / f(g(x))g'(x)dx. 

da d C 

Proof The first integral exists by assumption. By Corollary 33 the composite fog is 
Riemann integrable. Since g' is continuous, the second integral exists by Corollary 27. 
To show that the two integrals are equal we resort again to Riemann sums. Let P 
partition the interval [c, d\ as 


c = xq < x\ < • • • < x n — d 


and choose tk E [%k- h x k] such that 

g(xk) - g(xk- 1 ) = g'(t k ) Ax k . 


The Mean Value Theorem ensures that such a tk exists. Since g is a diffeomorphism 
we have a partition Q of the interval [a, b] 


a = y 0 < yi < . . . < y n = b 

where yk — g{xk ), and meshP — > 0 implies that meshQ — > 0. Set Sk — g[tk )• This 
gives two equal Riemann sums 

n n 

P / f(sk)Ay k = ^ ~2f(g(t k ))g'(t k )Ax k 
k = 1 k = 1 

which converge to the integrals J ^ f(y ) dy and f(g(t)g f (t ) dt as meshP — > 0. Since 
the limits of equals are equal, the integrals are equal. □ 


Actually, it is sufficient to assume that g' E < R. 


39 Integration by Parts If f,g : [a, b] — >► M are differentiable and f',g' E 01 then 


* b rb 

f{x)g\x) dx = f{b)g{b ) - f(a)g(a) - / f'(x)g(x) dx. 

a Ja 
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Proof Differentiability implies continuity implies integrability, so /, g G 3L Since 
the product of Riemann integrable functions is Riemann integrable, f'g and fg r 
are Riemann integrable. By the Leibniz Rule, (fg)'(x) — f(x)g'(x ) + f'(x)g(x) 
everywhere. That is, fg is an antiderivative of f g + fg f . The Antiderivative Theorem 
states that fg differs from the indefinite integral of f'g + fg' by a constant. That is, 
for all t G [a, b] we have 

- f( a )g( a ) = 


Setting t — b gives the result. 


f{x)g{x) + f(x)g'(x ) dx 


a 


*£ rt 

f'(x)g(x) dx + / f(x)g'(x)dx. 

a J a 


□ 


Improper Integrals 


Assume that / : [a, b) — >> R is Riemann integrable when restricted to any closed 
subinterval [a, c] C [a, b). You may imagine that f(x) has some unpleasant behavior 
as x — > 6, such as limsup^^ \f(x)\ = oo and/or b — oo. See Figure 83. 




Figure 83 The improper integral converges if and only if the total 

undergraph area is finite. 
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If the limit of f ^ f{x) dx exists (and is a real number) as c — > 6 then it is natural 
to define it as the improper Riemann integral 

■6 


no re 

/ f(x) dx — lim / f(pc) dx. 
Ja c ^ b Ja 


In order that the two-sided improper integral exists for a function / : (a, b) — > R 
it is natural to hx some point m G (a, b ) and require that both improper integrals 
f(x)dx and J^f(x)dx exist. Their sum is the improper integral f^f(x)dx. 
With some ingenuity you can devise a function / : R — > R whose improper integral 
/_°° /(x) dx exists despite the fact that / is unbounded at both ±oo. See Exercise 55. 


3 Series 


A series is a formal sum ^ a & where the terms a & are real numbers. The n th partial 
sum of the series is 

A n — CLq + tti + (22 T ' * * T CLfi' 

The series converges to A if A n — >► A as n — > oo, and we write 

(X) 

A y ^ &k • 

/c=0 

A series that does not converge diverges. The basic question to ask about a series 
is: Does it converge or diverge? 

For example, if A is a constant and |A| < 1 then the geometric series 

(X) 


^X k = 1 + A + --- + A n + ... 


k = 0 


converges to 1/(1 — A). For its partial sums are 


A n — 1 + A + A 2 + • • • + X n — 


1 - A n+1 
1- A 


and A n+1 — > 0 as n — > oo. On the other hand, if |A| > 1 then the series ^ X k diverges. 

Let ^2 a n be a series. The Cauchy Convergence Criterion from Chapter 1 applied 
to its sequence of partial sums yields the CCC for series 


converges if and only if 


Ve > 0 3N such that m,n > N 


n 

T a k 

k=m 


< e. 
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One immediate consequence of the CCC is that no finite number of terms affects 
convergence of a series. Rather, it is the tail of the series, the terms a & with k large, 
that determines convergence or divergence. Likewise, whether the series leads off 
with a term of index k = 0 or k = 1, etc. is irrelevant. 


A second consequence of the CCC is that if a & does not converge to zero as 
k — > oo then Y a & does not converge. For Cauchyness of the partial sum sequence 
( A n ) implies that a n — A n — A n _ \ becomes small when n — > oo. If |A| > 1 then 
the geometric series X k diverges since its terms do not converge to zero. The 

harmonic series 

00 i 11 

E — — Iff - — — + . . . 

k *> ^ 


k= 1 


gives an example that a series can diverge even though its terms do tend to zero. See 
below. 


Series theory has a large number of convergence tests. All boil down to the 
following result. 

40 Comparison Test If a series Y bk dominates a series Y a k in the sense that 
for all sufficiently large k we have \a^\ A bk then convergence ofYbk implies conver- 
gence ofJ2 a k- 


Proof Given e > 0, convergence of Y bk implies there is a large N such that for all 
m, n > N we have Yk=m bk < e. Thus 


n 

V <ik 

k=m 


n 

< V M 

k=m 


n 

< V bk < e 

k=m 


and convergence of Y a k follows from the CCC. 


□ 


Example The series ^sin(fc)/2 /c converges since it is dominated by the geometric 
series ^1/2^. 


A series Y a k converges absolutely if Y\ a k\ converges. The comparison test 
shows that absolute convergence implies convergence. A series that converges but not 
absolutely converges conditionally. That is, Y a k converges and Y\ a k\ diverges. 
See below. 

Series and integrals are both infinite sums. You can imagine a series as an im- 
proper integral in which the integration variable is an integer, 
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More precisely, given a series a&, define / : [0, oo) 

f(x) = a/e if k — 1 < x 


— > M by setting 
< k. 


See Figure 84. 




0 


2 3 4 5 Jt Jt+ 1 A + 2 

Figure 84 The pictorial proof of the integral test 


Then 


(X) /»oo 

y'at = / f(x)dx. 
k = o 70 


The series converges if and only if the improper integral does. The natural interpre- 
tation of this picture is the 

41 Integral Test Suppose that J 0 °° f(x) dx is a given improper integral and a k is 
a given series. 

(a) If |a/e| < f{x) for all sufficiently large k and all x G (k — 1, k\ then convergence 
of the improper integral implies convergence of the series. 

(b) If \f(x)\ < a^ for all sufficiently large k and all x G [k,k + 1) then divergence 
of the improper integral implies divergence of the series . 

Proof (a) For some large Nq and all N > No we have 


N 

E 

h=N 0 + l 


dk 


rN roc 

< / fix) dx < / f(x) 

J Nq J 0 


dx. 
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which is a finite real number. An increasing, bounded sequence converges to a limit, so 
the tail of the series \a^\ converges and the whole series \a^\ converges. Absolute 
convergence implies convergence. 


The proof of (b) is left as Exercise 58. 


□ 


Example The p-series ^ l/k p converges when p > 1 and diverges when p < 1. 
Case 1. y > 1. By the Fundamental Theorem of Calculus and differentiation, 

f b 1 , b l ~ p - 1 1 

/ — dx — — y 

J i x p 1 — p p — 1 

as b -r oo. The improper integral converges and dominates the p-series, which implies 
convergence of the series by the integral test. 

Case 2. p < 1. The p-series dominates the improper integral 



/ 


< 


log b 
b l ~ p - 


1 


l 1 -p 


if p — 1 
if p < 1. 


As b -r oo, these quantities blow up, and the integral test implies divergence of the 
series. When p = 1 we have the harmonic series, which we have just shown to diverge. 


The exponential growth rate of the series ^ &k is 


a — lim sup \f\ak 

k—foo 


Example or has exponential growth rate a. 

42 Root Test Let a be the exponential growth rate of a series If ol <1 then 

the series converges, if a > 1 then the series diverges, and if a — 1 then the root test 
is inconclusive. 


Proof If a < 1 then we fix a constant f3 with 

a < (3 < 1 . 

Then for all large k we have \ar\ l ^ k < /?; i.e. , |a&| < f3 k , which gives convergence of 
by comparison to the geometric series f3 k . 

If a > 1, choose f3 with 1 < (3 < a. Then \a^\ > (3 k for infinitely many k. Since 
the terms a^ do not converge to 0, the series diverges. 
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To show that the root test is inconclusive when a — 1, we must find two series, 
one convergent and the other divergent, both having exponential growth rate a — 1. 
The examples are p-series. We have 



— p log (k) 


k 


—p log(x) 

i^j 

X 



rsj 0 


by L Hopital s Rule as k — x — > oo. Therefore a — lim/ c ^ 00 (l /k p ) l / k = 1 for all 
p-series. Since the square series ^l/k 2 converges and the harmonic series ^1/fc 
diverges the root test is inconclusive when a = 1. □ 


43 Ratio Test Let the ratio between successive terms of the series ^ a & be r & = 
\ak+i/ak\, and set 

limsupr/e — p liminfr^ — A. 

k — yoo k^oo 

If p < 1 then the series converges, if A > 1 then the series diverges, and otherwise 
the ratio test is inconclusive. 


Proof If p < 1, choose /3 with p < /3 < 1. For all k > some K , |a^ + i/a^| < /?; i.e. 


a k <P k ~ K a K =Cf3 


)k 


where C — (3 k \ok\ is a constant. Convergence of a r follows from comparison with 
the geometric series ^Cf3 k . If A > 1, choose (3 with 1 < /? < A. Then \a k \ > P k /C 
for all large k , and ^ a & diverges because its terms do not converge to 0. Again the 
p-series all have ratio limit p — A = 1 and demonstrate the inconclusiveness of the 
ratio test when p = 1 or A = 1. □ 


Although it is usually easier to apply the ratio test than the root test, the latter 
has a strictly wider scope. See Exercises 61 and 65. 


Conditional Convergence 

If (a/e) is a decreasing sequence in R that converges to 0 then its alternating 

series 

y>l) fe+ W = ai — a 2 + a 3 — . . . 

converges. For 


A 2 n — (&1 — CI 2 ) + (a 3 — a 4 ) + . . . (a 2 n-l — 0,2 n) 
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& 2 n-l “ n CL 3 — a 4 


a 1 — <22 


0 & 2 n Ci 2 n -1 CL 4 


<23 


ai 


Figure 85 The pictorial proof of alternating convergence 


and ak - 1 — ctk is the length of the the interval = (a/e,a/e_i). The intervals are 
disjoint, so the sum of their lengths is at most the length of (0,ai), namely a\. See 
Figure 85. 

The sequence (A 2 n ) is increasing and bounded, so lim n ^ 00 A 2 n exists. The partial 
sum A- 2 yi~\- 1 differs from A2 n by a2 n +i, a quantity that converges to 0 as n — >> 00 , so 


lim A 2n = hm A 2n + 1 

n— ^00 n— >00 


and the alternating series converges. 

When a/e = 1 /k we have the alternating harmonic series. 


(X) 


E 

k = 1 


(-l) fc+1 111 

1 i = 1 b + 

k 2 3 4 


which we have just shown is convergent. 


Series of Functions 

A series of functions is of the form 


'EfkW, 

k = 0 

where each term /& : (a, b) — > M is a function. For example, in a power series 

the functions are monomials c/eX^. (The coefficients c& are constants and x is a real 
variable.) If you think of A = x as a variable then the geometric series is a power 
series whose coefficients are 1, namely x k . Another example of a series of functions 

is a Fourier series 


a^ sin(fcx) + b & cos (fcx). 
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44 Radius of Convergence Theorem If ^22 c\~x k is a power series then there is 
a unique R with 0 < R < oo, its radius of convergence , such that the series 


converges whenever 
the formula 


x 


< R , and diverges whenever \x\ > R. Moreover R is given by 


R = 


lim sup V\Ck\ 

k — yoG 


Proof Apply the root test to the series ^2crx k . Then 


lim sup fj\crx k 


x 


k — ^oc 


lim sup \/\ck 

k—> OO 


X 


R 


If lx I < R the root test gives convergence. If lx I > R it gives divergence 


□ 


There are power series with any given radius of convergence, 0 < R < oo. The 
series ^2 k k x k has R — 0. The series ^2 x k /<j k has R — a for 0 < <j < oo. The series 
^2 x k /k\ has R — oo. Eventually, we show that a function dehned by a power series is 
analytic: It has all derivatives at all points and it can be expanded as a Taylor series 
at each point inside its radius of convergence, not merely at x = 0. See Section 6 in 
Chapter 4. 
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Exercises 


l. 


2 . 


Assume that / : R — > R satisfies |/(t) — /(x)| < |t — x | 2 for all £, x. Prove that 
/ is constant. 

A function / : (a, 6) — > R satisfies a Holder condition of order ol if a > 0 , 
and for some constant H and all u, x G (a, b ) we have 


I/O) - f(x) 


< H 


u — X 


a 


The function is said to be o-Holder, with o-Holder constant H . (The terms 
“Lipschitz function of order a” and “o-Lipschitz function” are sometimes used 
with the same meaning.) 

(a) Prove that an o-Holder function defined on (a, b ) is uniformly continuous 
and infer that it extends uniquely to a continuous function defined on 
a, 6]. Is the extended function a-Holder? 

(b) What does <u-Holder continuity mean when a = 1 ? 

(c) Prove that <n-Holder continuity when a > 1 implies that / is constant. 

3 . Assume that / : (a, b) R is differentiable. 

(a) If /'( x) > 0 for all x, prove that / is strictly monotone increasing. 

(b) if r (x) > 0 for all x, what can you prove? 

4 . Prove that yjn + 1 — y/n — > 0 as n — > oo. 

5 . Assume that / : R — > R is continuous, and for all x 7^ 0 , f'(x ) exists. If 
lim / r (x) = L exists, does it follow that /'(O) exists? Prove or disprove. 

6. In L’Hopitahs Rule, replace the interval (a, 6) with the half-line (a, 00) and 
interpret u x tends to 6” as u x — > 00.” Show that if //g tends to 0/0 and /'/g? 
tends to L then f / g tends to L also. Prove that this continues to hold when 
L — 00 in the sense that if f'/g' — > 00 then f / g 00. 

7 . In L’Hopitahs Rule, replace the assumption that f/g tends to 0/0 with the 
assumption that it tends to 00/00. If f'/g' tends to L, prove that f/g tends 
to L also. [Hint: Think of a rear guard instead of an advance guard.] [Query: 
Is there a way to deduce the 00/00 case from the 0/0 case? Naively taking 
reciprocals does not work.] 

8. (a) Draw the graph of a continuous function defined on [ 0 , 1 ] that is differen- 

tiable on the interval (0, 1) but not at the endpoints. 

(b) Can you find a formula for such a function? 

(c) Does the Mean Value Theorem apply to such a function? 

9 . Assume that / : R — > R is differentiable. 

(a) If there is an L < 1 such that for each x G R we have f'(x) < L, prove 
that there exists a unique point x such that /(x) = x. [x is a fixed point 
for /.] 

(b) Show by example that (a) fails if L = 1. 
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10. Concoct a function / : R R with a discontinuity of the second kind at x — 0 
such that / does not have the intermediate value property there. Infer that it 
is incorrect to assert that functions without jumps are Darboux continuous. 
*11. Let / : (a, b) R be given. 

(a) If /" (x) exists, prove that 


lim 

h—> 0 



h) - 2 f(x) + f(x + Q 

h 2 



(b) Find an example that this limit can exist even when f"(x) fails to exist. 
*12. Assume that / : ( — 1, 1) — > R and f'( 0) exists. If <a n , f3 n 0 as n — > oo, define 
the difference quotient 

n _ f(Pn) ~ f(an ) 

” /?n - «n ' 

(a) Prove that lim = /'( 0) under each of the following conditions. 

n— ^oo 

(i) 0 <C f3 n . 

(ii) 0 < a n < j3 n and — < M. 

Pn 

(iii) f(x ) exists and is continuous for all x G (—1,1). 

(b) Set f(x) = x 2 sin(l/x) for and /(0) = 0. Observe that / is differen- 

tiable everywhere in ( — 1, 1) and f'( 0) = 0. Find <u n ,/3 n that tend to 0 in 
such a way that D n converges to a limit unequal to /'( 0). 

13. Assume that / and g are r th order differentiable functions (a, 6) — > R, r > 1. 
Prove the Higher-Order Leibniz Product Rule for the function / • g , 





14. 

15. 

16. 


where 


r 

k 


— r\/(k!(r — k)!) is the binomial coefficient, r choose k. [Hint: 

Induction.] 

For each r > 1, fold a function that is C r but not C r+1 . 

Define fix) — x 2 if x < 0 and f(x) = x + x 2 if x > 0. Differentiation gives 
f"{x) = 2. This is bogus. Why? 

logx is dehned to be l/t dt for x > 0. Using only the mathematics explained 
in this chapter, 

(a) Prove that log is a smooth function. 

(b) Prove that log (xy) = logx + logy for all x, y > 0. [Hint: Fix y and define 
/(x) = log(xy) — logx — logy. Show that /(x) = 0.] 

(c) Prove that log is strictly monotone increasing and its range is all of M. 
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17. Define e : R — > R by 


( e 1 / x if x > 0 
[ 0 if x < 0 


(a) Prove that e is smooth; that is, e has derivatives of all orders at all points 
x. [Hint: L’Hopital and induction. Feel free to use the standard differen- 
tiation formulas about e x from calculus.] 

(b) Is e analytic? 

(c) Show that the bump function 

/3(x) — e 2 e(l — x) • o(x + 1) 


is smooth, identically zero outside the interval (—1,1), positive inside the 
interval (—1,1), and takes value 1 at x = O.t (e 2 is the square of the base of 
the natural logarithms, while e(x) is the function just defined. Apologies 
to the abused notation.) 


(d) For \x\ < 1 show that 


/3(x) — e 


—2x 2 /(x 2 — l) 


Bump functions have wide use in smooth function theory and differential 
topology. The graph of /3 looks like a bump. See Figure 86. 



Figure 86 The graph of the bump function /3 


**18. Let L be any closed set in R. Prove that there is a smooth function / : R — >> [0,1] 
such that f{x) — 0 if and only if x E L. To put it another way, every closed set 
in R is the zero locus of some smooth function. [Hint: Use Exercise 17(c). 


^The support of a function is the closure of the set of points at which the function is nonzero. 
The support of is [ — 1,1]. 
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19. Recall that the oscillation of an arbitrary function / : [a, b\ — >> R at x is 


osc x / = lim sup f{t) — lim inf f[t) 

t-+x l ^ x 


* 


21 




22. 




In the proof of the Riemann-Lebesgue Theorem D & refers to the set of points 
with oscillation > 1/k. 

(a) Prove that D & is closed. 

(b) Infer that the discontinuity set of / is a countable union of closed sets. 
(This is called an i^-set.) 

(c) Infer from (b) that the set of continuity points is a countable intersection 
of open sets. (This is called a G^-set).) 

20. Baire’s Theorem (page 256) asserts that if a complete metric space is the count- 
able union of closed subsets then at least one of them has nonempty interior. 
Use Baire’s Theorem to show that the set of irrational numbers is not the 
countable union of closed subsets of R. 

Use Exercises 19 and 20 to show there is no function / : R — > R which is discon- 
tinuous at every irrational number and continuous at every rational number. 
Show that there exists a subset S of the middle-thirds Cantor set which is never 
the discontinuity set of a function / : R — > R. Infer that some zero sets are 
never discontinuity sets of Riemann integrable functions. [Hint: How many 
subsets of C are there? How many can be countable unions of closed sets?] 

23. Suppose that f n : [a, b } — > R is a sequence of continuous functions that converges 
pointwise to a limit function / : [a, b] — > R. Such an / is said to be of Baire 
class 1. (Pointwise convergence is discussed in the next chapter. It means 
what it says: For each x, fn (x) converges to f(x) as n — > oo. Continuous 
functions are considered to be of Baire class 0, and in general a Baire class 
r function is the pointwise limit of a sequence of Baire class r — 1 functions. 
Strictly speaking, it should not be of Baire class r — 1 itself, but for simplicity I 
include continuous functions among Baire class 1 functions. It is an interesting 
fact that for every r there are Baire class r functions not of Baire class r — 1. 
You might consult A Primer of Real Functions by Ralph Boas.) 

Prove that the set Dr of discontinuity points with oscillation > 1/k is nowhere 
dense, as follows. To arrive at a contradiction, suppose that D & is dense in 
some interval (cq/3) C [a, b\. By Exercise 19, D & is closed, so it contains (a,/3). 
Cover R by countably many intervals (o^, bjf) of length < 1/k and set 




(a) Why does — [a, b]l 

(b) Show that no Hu contains a subinterval of (cq/3). 


202 


Functions of a Real Variable 


Chapter 3 


(c) Why are 


Ftdmn 
E 'imN 


1 1 

{x e[a,b\ : aH < f n (x) <bi } 

rn rn 

fl dOrm 

n>N 


closed? 

(d) Show that 


H l U EgmN 

m,iVeN 


24. 


25. 


(e) Use (a) and Baire’s Theorem (page 243) to deduce that some E( m n con- 
tains a subinterval of (a, (3). 

(f) Why does (e) contradict (b) and complete the proof that Dj~ is nowhere 
dense? 

Combine Exercises 19, 23, and Baire’s Theorem to show that a Baire class 1 
function has a dense set of continuity points. 

Suppose that g : [a, b] -X R. is differentiable. 

(a) Prove that g r is of Baire class 1. [Hint: Extend g to a differentiable function 
defined on a larger interval and consider 


fn(x) = 


g(x + 1/n) — g(x) 
1 In 


26. 


27. 


for x G [a, b\. Is f n (x) continuous? Does f n (x) converge pointwise to g\x) 
as n — > oo?] 

(b) Infer from Exercise 24 that a derivative cannot be everywhere discontinu- 
ous. It must be continuous on a dense subset of its domain of definition. 

Redefine Riemann and Darboux integrability using dyadic partitions. 

(a) Prove that the integrals are unaffected. 

(b) Infer that Riemann’s integrability criterion can be restated in terms of 
dyadic partitions. 

(c) Repeat the analysis using only partitions of [a, b } into subintervals of length 
(■ b — a)/n. 

In many calculus books, the definition of the integral is given as 




n 


f(x) dx = lim W f{xl) 

71— (X) L ' 


b — a 


a 


k = 1 


n 


where x is the midpoint of the k th interval of [a, b] having length ( b 
namely 

a + {k — 1)(6 — a)/n, a + k{b — a)/n\. 

See Stewart’s Calculus with Early Trans cendentals, for example. 


— a)/n. 
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(a) If / is continuous, show that the calculus-style limit exists and equals the 
Riemann integral of /. [Hint: This is a one-liner. 

(b) Show by example that the calculus-style limit can exist for functions which 
are not Riemann integrable. 

(c) Infer that the calculus-style definition of the integral is inadequate for real 
analysis. 

28. Suppose that Z cl. Prove that the following are equivalent. 

(i) Z is a zero set. 

(ii) For each e > 0 there is a countable covering of Z by closed intervals [cq, bi 
with total length 'Yhbi — ai < e. 

(iii) For each e > 0 there is a countable covering of Z by sets Si with total 
diameter ^diamS^ < e. 

29. Prove that the interval [a, b] is not a zero set. 

(a) Explain why the following observation is not a solution to the problem: 
“Every open interval that contains [a, b] has length > b — a.” 

(b) Instead, suppose there is a “bad” covering of [a, b] by open intervals {Ii} 
whose total length is < b — a, and justify the following steps. 

(i) It is enough to deal with Unite bad coverings. 

(ii) Let 23 = {/i, . . . , I n } be a bad covering such that n is minimal among 
all bad coverings. 

(iii) Show that no bad covering has n — 1 so we have n > 2 . 

(iv) Show that it is no loss of generality to assume a G I\ and L fl b / 0. 

(v) Show that I — I± U I 2 is an open interval and \I\ < |/i| + I -/2 • 

(vi) Show that 23' = {/, / 3 , . . . , I n } is a bad covering of [a, b] with fewer 
intervals, a contradiction to minimality of n. 

30. The standard middle-quarters Cantor set Q is formed by removing the 
middle quarter from [ 0 , 1 ], then removing the middle quarter from each of the 
remaining two intervals, then removing the middle quarter from each of the 
remaining four intervals, and so on. 

(a) Prove that Q is a zero set. 

(b) Formulate the natural definition of the middle /3-Cantor set. 

(c) Is it also a zero set? Prove or disprove. 

31. Define a Cantor set by removing from [0, 1] the middle interval of length 1/4. 
From the remaining two intervals F 1 remove the middle intervals of length 
1/16. From the remaining four intervals F 2 remove the middle intervals of 
length 1/64, and so on. At the step in the construction F n consists of 2 n 
subintervals of F n ~ l . 

(a) Prove that F — f \ F n is a Cantor set but not a zero set. It is referred to 

as a fat Cantor set. 

(b) Infer that being a zero set is not a topological property: If two sets are 
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33. 


* 


34. 


* 



homeomorphic and one is a zero set then the other need not be a zero set. 
[Hint: To get a sense of this fat Cantor set, calculate the total length of the 
intervals which comprise its complement. See Figure 52 and Exercise 35. 
Consider the characteristic function of the dyadic rational numbers, f{pc) — 1 
if x — k/2 n for some k G Z and n G N, and f(x) — 0 otherwise. 

(a) What is its set of discontinuities? 

(b) At which points is its oscillation > e? 

(c) Is it integrable? Explain, both by the Riemann-Lebesgue Theorem and 
directly from the definition. 

(d) Consider the dyadic ruler function g(x) — l/2 n if x — k/2 n and g(x) = 
0 otherwise. Graph it and answer the questions posed in (a), (b), (c). 

(a) Prove that the characteristic function / of the middle-thirds Cantor set C 
is Riemann integrable but the characteristic function g of the fat Cantor 
set F (Exercise 31) is not. 

(b) Why is there a homeomorphism h : [0, 1] [0, 1] sending C onto F7 

(c) Infer that the composite of Riemann integrable functions need not be Rie- 
mann integrable. How is this example related to Corollaries 28 and 32 of 
the Riemann-Lebesgue Theorem? See also Exercise 35. 

Assume that ip : [a, b] — > R is continuously differentiable. A critical point of 
-0 is an x such that ip'(x) — 0. A critical value is a number y such that for at 
least one critical point x we have y — i/j(x). 

(a) Prove that the set of critical values is a zero set. (This is the Morse-Sard 
Theorem in dimension one.) 

(b) Generalize this to continuously differentiable functions R — > R. 

Let F C [0, 1] be the fat Cantor set from Exercise 31, and define 


rx 

ip(x) — / dist(£, F) dt 

J o 

where dist (t, F) refers to the minimum distance from t to F. 

(a) Why is ib a continuously differentiable homeomorphism from [0,11 onto 
[0, L\ where L — ^(1)? 

(b) What is the set of critical points of -0? (See Exercise 34.) 

(c) Why is 'ifj(F) a Cantor set of zero measure? 

(d) Let / be the characteristic function of 'ifi(F). Why is / Riemann integrable 
but / o ^ not? 

(e) What is the relation of (d) to Exercise 33? 

See also Exercise 6.77. 

36. Generalizing Exercise 1.31, we say that / : (a, b) R has a jump discontinu- 
ity (or a discontinuity of the first kind) at c G (a, b ) if 


f(c ) = lim f{pc) and /(c + ) 

x — ^ c — 


lim f(x) 

X— ?>C+ 
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exist, but are either unequal or are unequal to /(c). (The three quantities exist 
and are equal if and only if / is continuous at c.) An oscillating discontinuity 
(or a discontinuity of the second kind is any nonjump discontinuity. 

(a) Show that / : R — > R has at most countably many jump discontinuities. 

(b) What about the function 


* 




37. 


38. 


39. 


40. 


41 


42. 


/O) = 


sin — if x > 0 
0 if x < 0? 


(c) What about the characteristic function of the rationals? 

Suppose that / : R — >> R has no jump discontinuities. Does / have the interme- 
diate value property? (Proof or counterexample.) 

Recall that J > (S r ) = 2^ is the power set of S', the collection of all subsets of S, 
and 01 is the set of Riemann integrable functions / : [a, b] — > R. 

(a) Prove that the cardinality of 01 is the same as the cardinality of T(R), 
which is greater than the cardinality of R. 

(b) Call two functions in 01 integrally equivalent if they differ only on a 
zero set. Prove that the collection of integral equivalence classes of 01 has 
the same cardinality as R, namely 2 N . 

(c) Is it better to count Riemann integrable functions or integral equivalence 
classes of Riemann integrable functions? 

(d) Show that /, g G 01 are integrally equivalent if and only if the integral of 
|/ — g | is zero. 

Consider the characteristic functions f(x) and g(x) of the intervals [1,4] and 
[2,5]. The derivatives f and g r exist almost everywhere. The integration- by- 
parts formula says that 

*3 p3 

f(x)g'(x)dx = /(3)<?(3) — f(0)g(0) — / f\x)g(x)dx. 

0 J 0 

But both integrals are zero, while f(3)g(3) — f(0)g(0) = 1. Where is the error? 
Set 

0 if x < 0 f 0 if r < 0 

f(x ) = { ■ K ^ n and g(x) = 


• ^ t n 

sin — it x > 0 

X 


1 if x > 0. 


Prove that / has an antiderivative but g does not. 

Show that any two antiderivatives of a function differ by a constant. [Hint: 
This is a one-liner.] 

Suppose that ^ : [c, d\ — > [a, b] is continuous and for every zero set Z C [a, 6], 
/T re (Z) is a zero set in [c, d\. 

(a) If / is Riemann integrable, prove that / o -0 is Riemann integrable. 

(b) Derive Corollary 32 from (a). 
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43. Let ip(x) — xsinl/x for 0 < x < 1 and -0(0) = 0. 

(a) If / : [—1,1] — > R is Riemann integrable, prove that / o -0 is Riemann 
integrable. 

(b) What happens for i/j(x) — vTsinl/x? 

*44. Assume that ^ : [c, d] — > [a, 6] is continuously differentiable. 

(a) If the critical points of ^ form a zero set in [c, d] and / is Riemann inte- 
grable on [a, b } prove that / o -0 is Riemann integrable on [c, d]. 

(b) Conversely, prove that if / o ip is Riemann integrable for each Riemann 
integrable / on [a, 6], then the critical points of ^ form a zero set. [Hint: 
Think in terms of Exercise 34. 

(c) Prove (a) and (b) under the weaker assumption that ^ is continuously 
differentiable except at finitely many points of [c, d \ . 

(d) Derive part (a) of Exercise 35 from (c). 

(e) Weaken the assumption further to being continuously differentiable on 
an open subset of [c, d] whose complement is a zero set. 


Remark The following assertion, to be proved in Chapter 6, is related to the 
preceding exercises. If / : [a, b] — > M satisfies a Lipschitz condition or is 
monotone then the set of points at which f(x ) fails to exist is a zero set. 
Thus: “A Lipschitz function is differentiable almost everywhere,” which is 
Rademacher’s Theorem in dimension 1, and a “monotone function is al- 
most everywhere differentiable,” which is the last theorem in Lebesgue’s book, 
Legons sur Vintegration. See Theorem 6.57 and Corollary 6.59. 

45. (a) Define the oscillation for a function from one metric space to another, 
f :M N. 

Is it true that / is continuous at a point if and only if its oscillation is zero 
there? Prove or disprove. 

(c) Is the set of points at which the oscillation of / is > \/k closed in Ml 
Prove or disprove. 

46. (a) Prove that the integral of the Zeno’s staircase function described on page 174 

is 2/3. 

(b) What about the Devil’s staircase? 

47. In the proof of Corollary 28 of the Riemann-Lebesgue Theorem, it is asserted 
that when </> is continuous the discontinuity set of (j) o f is contained in the 
discontinuity set of /. 

(a) Prove this. 

(b) Give an example where the inclusion is not an equality. 

(c) Find a sufficient condition on 0 so that (j)of and / have equal discontinuity 
sets for all / E ft 

(d) Is your condition necessary too? 
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48. 


49. 




50. 


51 


52. 


53. 


54. 


55. 


56. 


Assume that / E ft and for some m > 0 we have |/(x)| > m for all x E [a, b\. 
Prove that the reciprocal of /, l//(x), also belongs to ft. If / E ft, |/(x)| > 0, 
but no nn > 0 is an underbound for |/|, prove that the reciprocal of / is not 
Riemann integrable. 

Corollary 28 to the Riemann-Lebesgue Theorem asserts that if / G S and / 
is continuous, then <fi o / E ft. Show that piecewise continuity cannot replace 
continuity. [Hint: Take / to be a ruler function and 0 to be a characteristic 
function.] 

Assume that / : [a, 5] -E [c, d] is a Riemann integrable bijection. Is the inverse 
bijection also Riemann integrable? Prove or disprove. 

If f,g are Riemann integrable on [a, 6], and /(x) < g{pc) for all x E [a, 6], prove 
that /(x) dx < g{x) dx. (Note the strict inequality.) 

Let / : [a, b] -E R be given. Prove or give counterexamples to the following 
assertions. 

(a) / E ft |/| E ft. 

(b) |/| e =► / e a. 

(c) /Eft and |/(x)| > c > 0 for all x => 1// E ft. 

(d) / E ft => / 2 E ft. 

(e) / 2 E ft => / E ft. 

(f) / 3 E ft ^> / E ft. 

(g) / 2 E ft and /(x) > 0 for all x ^> / E ft. 

[Here / 2 and / 3 refer to the functions /(x) • /(x) and /(x) • /(x). /(x), 
not the iterates.] 

Given /, g E ft, prove that max(/,/ and min(/, g) are Riemann integrable, 
where max(/,/(x) = max(/(x),p(x)) and min(/, p)(x) = min(/(x), g(x)). 
Assume that /, p : [0, 1] — > M are Riemann integrable and /(x) = g(x) except 
on the middle-thirds Cantor set C. 

(a) Prove that / and g have the same integral. 

(b) Is the same true if /(x) = g(x) except for x E Q? 

(c) How is this related to the fact that the characteristic function of Q is not 
Riemann integrable? 

Invent a continuous function / : R — ^ M whose improper integral is zero, but 
which is unbounded as x E — oo and x -E oo. [Hint: / is far from monotone.] 
Assume that / : M -e M and that the restriction of / to each closed interval is 
Riemann integrable. 

(a) Formulate the concepts of conditional and absolute convergence of the 
improper Riemann integral of /. 

(b) Find an example that distinguishes them. 
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57. 


58. 

59. 

60. 


61. 

62. 


63. 

64. 

65. 


66. 


Construct a function / : [—1,1] — > R such that 


,—r 


>1 


lim 

r — ^0 


f(x) dx + f{x) dx 


-l 


exists (and is a finite real number) but the improper integral f(x) dx does 
not exist. Do the same for a function g : R — > R. such that 


>R 

lim / f{x) dx 

R — ^oo / 


exists but the improper integral f_ g{x) dx fails to exist. [Hint: The functions 
are not symmetric across 0.] 

Let / : [0, oo) — > [0, oo) and E a k be given. Assume that for all sufficiently 
large k and all x G [fc, k + 1) we have f(x) < a^. Prove that divergence of the 
improper integral J 0 °° f{x) dx implies divergence of E a^. 

Prove that if a n > 0 and E a n converges then EIa/^™)/ 77, converges. 

(a) If E converges and (b n ) is monotonic and bounded, prove that E a nb n 
converges. 

(b) If the monotonicity condition is dropped, or replaced by the assumption 
that lim n ^ 00 b n = 0, find a counterexample to convergence of E a nb n - 

Find an example of a series of positive terms that converges despite the fact 
that limsup rWoo \a n+ i/a n \ = oo. Infer that p cannot replace A in the divergence 
half of the ratio test. 

Prove that if the terms of a sequence decrease monotonically, a\ > <22 > . . ., 
and converge to 0 then the series E a k converges if and only if the associated 
dyadic series 

Ci\ H - 2a 2 -j - 4(2-4 H - 8(2-8 • • • — E 2 k a 2 k 


converges. (I call this the block test because it groups the terms of the series 

in blocks of length 2 /c_1 .) 

Prove that E l/k(log(k)) p converges when p > 1 and diverges when p < 1. 

Here k — 2, 3, .... [Hint: Integral test or block test.] 

Concoct a series E a k such that (— 1 ) k a^ > 0, — >> 0, but the series diverges. 

Compare the root and ratio tests. 

(a) Show that if a series has exponential growth rate p then it has ratio lim sup 
p. Infer that the ratio test is subordinate to the root test. 

(b) Concoct a series such that the root test is conclusive but the ratio test is 
not. Infer that the root test has strictly wider scope than the ratio test. 

Show that there is no simple comparison test for conditionally convergent series: 

(a) Find two series E a k an d E bk such that E bk converges conditionally, 
a k /b k — > 1 as k — >> 00 , and E a k diverges. 
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(b) Why is this impossible if the series ^ is absolutely convergent? 

67. An infinite product is an expression Yi c k where ry ; > 0. The n th partial 
product is C n = c\ • • • c n . If C n converges to a limit C ^ 0 then the product 
converges to C . Write = 1 T a&. If each > 0 or each < 0 prove that 
Y2 a k converges if and only if Yl c k converges. [Hint: Take logarithms.] 

68. Show that conditional convergence of the series ^ a & and the product n(H“ a &) 
can be unrelated to each other: 

(a) Set a/e = (— 1 ) k /Vk. The series ^ a & converges but the corresponding 
product n(l + a k) diverges. [Hint: Group the terms in the product two 
at a time.] 

(b) Let = 0 when k is odd and e& = 1 when k is even. Set bk — 
^k/k+ ( — l) k /y/k. The series b & diverges while the corresponding prod- 
uct rWl + bk) converges. 

69. Consider a series Y2 a k and rearrange its terms by some bijection /3 : N — ^ N, 
forming a new series ^ a^o.)- The rearranged series converges if and only if the 
partial sums ap(i) + . . . + ap( n ) converge to a limit as n — > oo. 

(a) Prove that every rearrangement of a convergent series of nonnegative terms 
converges - and converges to the same sum as the original series. 

(b) Do the same for absolutely convergent series. 

*70. Let ^2 a k be given. 

(a) If ^2 a k converges conditionally, prove that rearrangement totally alters its 
convergence in the sense that some rearrangements ^ b^ of ^ a k diverge to 
Too, others diverge to — oo, and others converge to any given real number. 

(b) Infer that a series is absolutely convergent if and only if every rearrange- 
ment converges. (The fact that rearrangement radically alters conditional 
convergence shows that although finite addition is commutative, infinite 
addition (i.e. , summing a series) is not.) 

**71. Suppose that ^ a k converges conditionally. If ^ b^ is a rearrangement of ^ a&, 
let Y be the set of subsequential limits of (B n ) where B n is the n th partial sum 
of Y2bk- That is, y E Y if and only if some B n£ — > y as t — > oo. 

(a) Prove that Y is closed and connected. 

(b) If Y is compact and nonempty, prove that ^2 bk converges to Y in the 
sense that dH(Y n , T)— ^Oasn— ^oo, where du is the Hausdorff metric on 
the space of compact subsets of R and Y n is the closure of {B m : m > n}. 
See Exercise 2.147. 

(c) Prove that each closed and connected subset of R is the set of subsequential 
limits of some rearrangement of Y2 a k- 

The article, “The Remarkable Theorem of Levy and Steinitz” by Peter 
Rosenthal in the American Math Monthly of April 1987 deals with some 
of these issues, including the higher dimensional situation. 
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**72. Absolutely convergent series can be multiplied in a natural way, the result being 
their Cauchy product, 


( OO \ / (X) \ (X) 

E a q (ZA ) = 

i=0 / \j= 0 J k= 0 

where Ck = + aifyc-i + h a/^o- 

(a) Prove that X] c & converges absolutely. 

(b) Formulate some algebraic laws for such products (commutativity, distribu- 
tivity, and so on). Prove two of them. 

[Hint for (a): Write the products aibj in an oo x oo matrix array M, and 
let A n , F> n , C n be the n th partial sums of X) cq, X] h/, X) c fc- You are asked 
to prove that (lim A n )(limF> n ) = limCX The product of the limits is the 
limit of the products. The product A n B n is the sum of all the afij in 
the n x n corner submatrix of M and c n is the sum of its antidiagonal. 
Now estimate A n B n — C n . Alternately, assume that a n ,b n > 0 and draw 
a rectangle R with edges A, B. Observe that R is the union of rectangles 
Rij with edges cq,h/-] 

**73. With reference to Exercise 72, 

(a) Reduce the hypothesis that both series X) a i and X^ X are absolutely con- 
vergent to merely one being absolutely convergent and the other conver- 
gent. (Exercises 72 and 73(a) are known as Mertens’ Theorem.) 

(b) Find an example to show that the Cauchy product of two conditionally 
convergent series may diverge. 

**74. The Riemann (^-function is defined to be ((s) = X^^Li n ~ s where s > 1. It 
is the sum of the p-series when p — s. Establish Euler’s product formula, 


(X) 

«») = n 


k= 1 


1 



where p& is the k th prime number. Thus, p\ — 2, p 2 — 3, and so on. Prove that 
the infinite product converges. [Hint: Each factor in the infinite product is the 
sum of a geometric series 1 + p^ s + ( P/X ) 2 + • • •• Replace each factor by its geo- 
metric series and write out the partial product. Apply Mertens’ Theorem, 
collect terms, and recall that every integer has a unique prime factorization. 


4 

Function Spaces 


1 Uniform Convergence and C° [a, b] 


Points converge to a limit if they get physically closer and closer to it. What about 
a sequence of functions? When do functions converge to a limit function? What 
should it mean that they get closer and closer to a limit function? The simplest idea 
is that a sequence of functions f n converges to a limit function / if for each x, the 
values fn (x) converge to /(x) as n -G oo. This is called pointwise convergence: 
A sequence of functions f n : [a, b] R converges pointwise to a limit function 
/ : [a, b] — > R if for each x G [a, b\ we have 


lim f n {x ) = f{x). 

n— >oc 

The function / is the pointwise limit of the sequence (/ n ) and we write 


fn t / or lim f n = f. 

n— ^oo 


Note that the limit refers to n oo, not to x — > oo. The same definition applies to 
functions from one metric space to another. 

The requirement of uniform convergence is stronger. The sequence of functions 
f n : [a, b] R converges uniformly to the limit function / : [a, b] — > R if for each 
e > 0 there is an N such that for all n > N and all x G [a, 6], 


(i) 


I fn(x) - f 0)1 < e. 
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The function / is the uniform limit of the sequence (f n ) and we write 

fn — £ f or unif lim f n = /. 

n— ^oo 

Your intuition about uniform convergence is crucial. Draw a tube V of vertical 
radius e around the graph of /. For n large, the graph of f n must he wholly in V . 
See Figure 87. Absorb this picture! 



a b 

Figure 87 The graph of f n is contained in the e-tube around the graph of 

/• 


It is clear that uniform convergence implies pointwise convergence. The difference 
between the two definitions is apparent in the following standard example. 

Example Define f n : (0, 1) — > R. by f n {x) — x n . For each x G (0, 1) it is clear that 
fn ( x) — > 0. The functions converge pointwise to the zero function as n — >> oo. They 
do not converge uniformly. For if e = 1/10 then the point x n = yi/2 is sent by f n 
to 1/2 and thus not all points x satisfy (1) when n is large. The graph of f n fails to 
he in the e-tube V. See Figure 88. 

The lesson to draw is that pointwise convergence of a sequence of functions is 
frequently too weak a concept. Gravitating toward uniform convergence we ask the 
natural question: 


Which properties of functions are 
preserved under uniform convergence? 
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The answers are found in Theorem 1, Exercise 4, Theorem 6, and Theorem 9. Uniform 
limits preserve continuity, uniform continuity, integrability, and - with an additional 
hypothesis - differentiability. 



0 


! 


0 



Figure 88 Non-uniform, pointwise convergence 


1 Theorem If f n =4 f and each f n is continuous at xq then f is continuous at xq. 
In other words, 

The uniform limit of continuous functions is continuous . 

Proof For simplicity, assume that the functions have domain [a, b] and target R. 
(See also Section 8 and Exercise 2.) Let e > 0 and xq G [a, b] be given. There is an 
N such that for all n > N and all x G [a, b] we have 


I fn{x) ~ f(x) | < 

The function fjy is continuous at xq and so there is a S > 0 such that 
implies 


x — xo 


< 5 


In(x) - f N (x o)| < 


Thus, if 


x — Xq 


< 8 then 


\f(x)-f(xo)\ < \f(x) - f N (x)\ + \f N (x) - f N (x 0 )\ + \f N (x 0 ) - f(x 0 ) 


e e e 

— — — 6 . 
~ 3 3 3 


which completes the proof that / is continuous at xq. 


□ 
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Without uniform convergence the theorem fails. For example, we can define 
f n : [0, 1] — > R as before, f n (x) — x n . Then f n (x) converges pointwise to the function 



0 if 0 < x < 1 

1 if x = 1. 


The function / is not continuous and the convergence is not uniform. What about the 
converse? If the limit and the functions are continuous, does pointwise convergence 
imply uniform convergence? The answer is “no,” as is shown by x n on (0, 1). But 
what if the functions have a compact domain of definition, [a, 6]? The answer is still 


“no.” 


Example John Kelley refers to this as the growing steeple. 


9 

n x 


fn(x ) = < 2 n — n 2 x 


0 


if 0 < x < — 

n 

1 2 

if — < x < — 
n n 

2 

if — < x < 1. 
n 


See Figure 89. 


Then lim f n (x) = 0 for each x , and f n converges pointwise to the function 

n—± oo 

/ = 0. Even if the functions have compact domain of definition, and are uniformly 
bounded and uniformly continuous, pointwise convergence does not imply uniform 
convergence. For an example, just multiply the growing steeple functions by 1/n. 


The natural way to view uniform convergence is in a function space. Let C b — 
Cb([a, 6], R) denote the set of all bounded functions [a, b] — > R. The elements of Cb 
are functions /, g, etc. Each is bounded. Define the sup norm on Cb as 



sup{|/(x)| : x G [a, b]}. 


The sup norm satisfies the norm axioms discussed in Chapter 1, page 28. 


||/|| > 0 and ||/|| = 0 if and only if / = 0 

l|c/ll = Mll/ll 

11/ + sll < ll/ll + llsll- 

As we observed in Chapter 2, any norm defines a metric. In the case at hand, 


d(f,g) = sup{|/(z) - g(x)\ : x € [a,b]} 
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Figure 89 The sequence of functions converges pointwise to the zero 

function, but not uniformly. 




Figure 90 The sup norm of / and the sup distance between / and g 


216 


Function Spaces 


Chapter 4 


is the corresponding metric on C&. See Figure 90. To distinguish the norm ||/|| = 
sup \f(x)\ from other norms on C b we sometimes write ||/|| S up f° r the sup norm. 

The thing to remember is that C 5 is a metric space whose elements are functions. 
Ponder this. 


2 Theorem Convergence with respect to the sup metric d is equivalent to uniform 
convergence. 


Proof If d(/ n ,/) — > 0 then sup{|/ n x 
conversely. 



x G 


a,b]} ->■ 0, so f n 


t /, and 

□ 


3 Theorem Cs is a complete metric space. 


Proof Let (f n ) be a Cauchy sequence in C&. For each individual xq G [a, b] the values 
f n (x o) form a Cauchy sequence in R since 


\fn(xo) - fm(x o)| < SUp{|/ n (x) - f m (x)\ \ X € [a, b}} = d(fn,fm)- 


Thus, for each x G [a, b\ 


lim f n (x) 


n^oG 


exists. Define this limit to be f(x). It is clear that f n converges pointwise to /. In 
fact, the convergence is uniform. For let e > 0 be given. Since (f n ) is a Cauchy 
sequence with respect to d, there exists N such that m,n > N imply 


fm ) < 


6 


2 


Also, since f n converges pointwise to /, for each x G [a, b] there exists an m — m(x) > 
N such that 


If n > and x G [a, b] then 


fm(x) - f(x) | < 


fn(x)-f(x) | < \fn(x) ~ f m (x)(x)\ + \fm(x)(x) ~ f(x) 

e e 
~ T ~ — 6. 

2 2 


Hence f n =4 /. The function / is bounded. For fjy is bounded and for all x we 
have \/n(x) — f(x)\ < e. Thus / G Cw By Theorem 2, uniform convergence implies 
d-convergence, d(/ n ,/) — > 0, and the Cauchy sequence (/ n ) converges to a limit in 
the metric space CV □ 
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The preceding proof is subtle. The uniform inequality d(/ n , /) < e is derived by 
nonuniform means - for each x we make a separate estimate using an m(x) depending 
nonuniformly on x. It is a case of the ends justifying the means. 

Let (7° = (7°([a, 6],R) denote the set of continuous functions [a, b] — > R. Each 
/ G (7° belongs to (7*, since a continuous function defined on a compact domain is 
bounded. That is, (7° C CV 

4 Corollary (7° zs a closed subset of Cs- It is a complete metric space. 

Proof Theorem 1 implies that a limit in Cs of a sequence of functions in (7° lies in 
(7°. That is, (7° is closed in C&. A closed subset of a complete space is complete. □ 


Just as it is reasonable to discuss the convergence of a sequence of functions we 
can also discuss the convergence of a series of functions Y fk- Merely consider the 
n th partial sum 

n 

F n(x) = ypfcOa 

k = 0 

It is a function. If the sequence of functions (F n ) converges to a limit function F 
then the series converges, and we write 

(X) 

F(x) = 

k = 0 


If the sequence of partial sums converges uniformly then we say the series converges 
uniformly. If the series of absolute values Y \fk(%)\ converges then the series Y fk 
converges absolutely. 


5 Weierstrass M-test If Y Mr is a convergent series of constants and if fk G Cs 
satisfies \\fk\\ A for all k then Y fk converges uniformly and absolutely. 


Proof If n > m then the partial sums of the series of absolute values telescope as 


d(F n ,Fm) ^ d(F n , F n -i) + • • • + F m ) 

n n 

= J2 n/feii ^ E M *- 

k=m J r 1 k=m J rl 


Since M \ converges, the last sum is < e when m, n are large. Thus (F n ) is Cauchy 
in (7fe, and by Theorem 3 it converges uniformly. □ 
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Next we ask how integrals and derivatives behave with respect to uniform con- 
vergence. Integrals behave better than derivatives. 


6 Theorem The uniform limit of Riemann integrable functions is Riemann inte- 
grate, and the limit of the integrals is the integral of the limit, 


lim 

n— ^oo 



fn(x) dx 


b 


unif lim f n (x) dx. 

n— ^oo 


In other words, k, the set of Riemann integrable functions defined on [a, 6], is a 
closed subset of Cs and the integral functional / i— > J f[x) dx is a continuous map 
from k to M. This extends the regularity hierarchy to 


Cb D ft D C° D C 1 D ■ ■ ■ D C°° D C u . 


1 1 


Theorem 6 gives the simplest condition under which the operations of taking limits 
and integrals commute. 

Proof Let f n G k be given and assume that f n =4 / as n -X oo. By the Riemann- 
Lebesgue Theorem, f n is bounded and there is a zero set Z n such that f n is continuous 
at each x G [a, b\\Z n . Theorem 1 implies that / is continuous at each x G [a, b]\[jZ n , 
while Theorem 3 implies that / is bounded. Since (J Z n is a zero set, the Riemann- 
Lebesgue Theorem implies that / G k. Finally 




a 


f b 

f (x) dx- fn (x) dx 

J a 


X 


f{x) - fn(x ) dx 


a 


< 


f I f (x) - fn(x)\ dx < d(f, f n )(b — a) 
J a 


0 


as n — > oo. Hence the integral of the limit is the limit of the integrals. 


□ 


7 Corollary If f n G k and f n =4 / then the indefinite integrals converge uniformly, 


* X 


* X 


fn (' t ) dt 


fit ) dt. 


a 


a 


Proof As above, 



< d(f n , f)(x — a) < d(f n ,f)(b-a)-b 0 


when n — > oo. 


□ 
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8 Term by Term Integration Theorem A uniformly convergent series of inte- 
grate functions Y fk can be integrated term-by-term in the sense that 


j-b 00 00 f-b 

/ Y fk( X )dx = Y I 
^ a k = o k = o ^ a 


dx. 


Proof The sequence of partial sums F n converges uniformly to Y fk- Each F 1 
belongs to fR since it is the finite sum of members of fR. According to Theorem 6, 

n oo 

Y / fk{x)dx = / F n (x) dx — > / Y fk(x) dx. 

k=0^ a ^ a ^ a k = 0 


This shows that the series Y f b fk ( x ) dx converges to f b Y fk ( x ) dx. 


□ 


9 Theorem The uniform limit of a sequence of differentiable functions is differen- 
tiable provided that the sequence of derivatives also converges uniformly. 


Proof We suppose that f n : [a, b] — > M is differentiable for each n and that f n =4 / 
as n — > oo. Also we assume that ff^g for some function g. Then we show that / 
is differentiable and in fact f — g. 


We first prove the theorem with a major loss of generality - we assume that each 
f' n is continuous. Then f n ,g G fR and we can apply the Fundamental Theorem of 
Calculus and Corollary 7 to write 


fn(x ) = fn(a)+ / fnXjdt 

J a 


rx 

/(«) + / gif) dt. 

J a 


Since / n 4 / we see that f(x) — /(a) + g(t) dt and, again by the Fundamental 
Theorem of Calculus, f — g. 


In the general case the proof is harder. Fix some x G [a, b] and define 


(t) 

fit) 


fn ft) ~ fn {X 
t — X 


fni x ) 

if t — x 

f(t) - f(x) 

if t 7^ x 

t — X 

g(x) 

if t — x. 


if t x 
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Each function <f> n is continuous since <fi n (t) converges to ff(x) as t — > x. Also it 
is clear that <f> n converges pointwise to <f> as n — > oo. We claim the convergence is 
uniform. For any m, n the Mean Value Theorem applied to the function f m — f n gives 

^ (+\ ^ (+\ ( fm(t ) — f n (t )) — ( fm ( x ) — fn ( x )) , f ^ ( ^ 

<Pm\t) ~ W) = | = Jm \P) ~ fA 6 ) 

t — X 

for some 0 between £ and x. Since =4 g the difference — tends uniformly to 0 as 
m, n — > oo. Thus (0 n ) is Cauchy in (7°. Since (7° is complete, converges uniformly 
to a limit function -0, and -0 is continuous. As already remarked, the pointwise limit 
of (j) n is 0, and so ip = (f>. Continuity of ^ = </> implies that p(x) = f'(x). □ 

10 Theorem A uniformly convergent series of differentiable functions can be differ- 
entiated term-by-term , provided that the derivative series converges uniformly , 

( oo \ r oo 

=J2fk( x )- 

k = 0 / k = 0 

Proof Apply Theorem 9 to the sequence of partial sums. □ 


Note that Theorem 9 fails if we forget to assume the derivatives converge. For 
example, consider the sequence of functions f n : [— 1, 1] — >> R defined by 



See Figure 91. The functions converge uniformly to f{x) — |x|, a nondifferentiable 
function. The derivatives converge pointwise but not uniformly. Worse examples 
are easy to imagine. In fact, a sequence of everywhere differentiable functions can 
converge uniformly to a nowhere differentiable function. See Sections 4 and 7. It is one 
of the miracles of the complex numbers that a uniform limit of complex differentiable 
functions is complex differentiable, and automatically the sequence of derivatives 
converges uniformly to a limit. Real and complex analysis diverge radically on this 
point. 


2 Power Series 

As another application of the Weierstrass M - test we say a little more about the power 
series A power series is a special type of series of functions, the functions 
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Figure 91 The uniform limit of differentiable functions need not be 

differentiable. 


being constant multiples of powers of x. As explained in Section 3 of Chapter 3, its 
radius of convergence is 

R= 1 


lim sup V\Ck\ 

k — 7*00 

Its interval of convergence is (— i?, R ). If x E (-R, R), the series converges and defines 
a function f(x) — while if x 0 [—R,R] the series diverges. More is true on 

compact subintervals of (-R, R). 

11 Theorem If r < R then the power series converges uniformly and absolutely on 
the interval [— r, r\. 

Proof Choose /? with r < /3 < R. For all large fc, ^/\ck\ < 1 //3 since (3 < R. Thus, 
if x < r then 


c k x 


k 


< 


r 

P 


k 


These are terms in a convergent geometric series and according to the M-test X] c k xk 
converges uniformly when x G [— r, r\. □ 


12 Theorem A power series can be integrated and differentiated term-by-term on 
its interval of convergence. 
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For f{pc) — ^CkX k and \x\ < R this means 


PCC oo 

/ f(t) dt = Y_ l 

Jo k = o 


C k k+1 


oo 


k + 1 


X 


and f\x) — kc^x 


k - 1 


k= 1 


Proof The radius of convergence of the integral series is determined by the expo- 
nential growth rate of its coefficients, 


lim sup 

k— ¥oo 


Ck - 1 


k 


limsup (|c fc _i| 1 /( fc - 1 >)C fc -U/ fe (T) 
k — >oo \ C J 


1/k 


Since (k — l)/k -+ 1 and k~ l / k -+ 1 as k -+ oo, we see that the integral series has the 
same radius of convergence i? as the original series. According to Theorem 8, term- 
by-term integration is valid when the series converges uniformly, and by Theorem 11, 
the integral series does converge uniformly on every closed interval [— r, r] contained 
in (— i?, R ). 

A similar calculation for the derivative series shows that its radius of convergence 
too is R. Term-by-term differentiation is valid provided the series and the derivative 
series converge uniformly. Since the radius of convergence of the derivative series is 


i?, the derivative series does converge uniformly on every 


—r, r 


C (-R,R). 


□ 


13 Theorem Analytic functions are smooth, i.e., C u C C °° . 


Proof An analytic function / is defined by a convergent power series. According to 
Theorem 12, the derivative of / is given by a convergent power series with the same 
radius of convergence, so repeated differentiation is valid, and we see that / is indeed 
smooth. □ 


The general smooth function is not analytic, as is shown by the example 


6 



e l/x if x > 0 

0 if x < 0 


on page 149. Near x — 0, &(x) cannot be expressed as a convergent power series. 

Power series provide a clean and unambiguous way to define functions, especially 
trigonometric functions. The usual definitions of sine, cosine, etc. involve angles 
and circular arc length, and these concepts seem less fundamental than the functions 
being defined. To avoid circular reasoning, as it were, we declare that by definition 


(X) b 

X 


exp x — 


E 

k = 0 


k\ 


sm x — 


oo 


E 

k = 0 


{ — l) k X 


k^2k+l 


(2fc + l)! 


COS X — 


oo 

E 

/c=0 


k^2k 


(—l) k x 

(2k)\ 
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We then must prove that these functions have the properties we know and love from 
calculus. All three series are easily seen to have radius of convergence R — oo. 
Theorem 12 justifies term-by-term differentiation, yielding the usual formulas, 

exp 7 (x) = expx sin 7 (a?) = cosx cos 7 (x) = — sinx. 

The logarithm has already been defined as the indefinite integral J/ 1/tdt. We claim 
that if | x| < 1 then log(l + x ) is given as the power series 

°° ( — 1 A+i 

iog(i + x) — — - — x k . 

k= 1 

To check this, we merely note that its derivative is the sum of a geometric series, 

1 1 oo oo 

(log (1 + ®))' = — — = _ 

V ’ k = 0 k= 0 

The last is a power series with radius of convergence 1. Since term by term integration 
of a power series inside its radius of convergence is legal, we integrate both sides of 
the equation and get the series expression for log(l + x) as claimed. 

The functions e x and 1/ (1 + x 2 ) both have perfectly smooth graphs, but the power 
series for e x has radius of convergence oo while that of 1/(1 + x 2 ) is 1. Why is this? 
What goes “wrong” at radius 1? The function 1/(1 + x 2 ) doesn’t blow up or have bad 
behavior at x = ±1 like log(l + x) does. It’s because of C. The denominator 1 + x 2 
equals 0 when x = ±V~~T- The bad behavior in C wipes out the good behavior in R. 


3 Compactness and Equicontinuity in C° 

The Heine-Borel theorem states that a closed and bounded set in R m is compact. 
On the other hand, closed and bounded sets in (7° are rarely compact. Consider, for 
example, the closed unit ball 

® = {/GC°([0,l],R) : ||/|| < 1}. 

To see that ® is not compact we look again at the sequence f n (x) — x n . It lies in 
®. Does it have a subsequence that converges (with respect to the metric d of (7°) 
to a limit in (7°? No. For if f Uk converges to / in (7° then /(x) = lim f nk (x). Thus 

k — yoo 

fix) — 0 if x < 1 and /( 1) = 1, but this function / does not belong to (7°. The cause 
of the problem is the fact that (7° is infinite-dimensional. In fact it can be shown 
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that if V is a vector space with a norm then its closed unit ball is compact if and 
only if the space is finite-dimensional. The proof is not especially hard. 

Nevertheless, we want to have theorems that guarantee certain closed and bounded 
subsets of C° are compact. For we want to extract a convergent subsequence of func- 
tions from a given sequence of functions. The simple condition that lets us go ahead 
is equicontinuity. A sequence of functions (f n ) in C° is equicontinuous if 

Ve > 0 3 5 > 0 such that 

s — t\ < 5 and n G N => | f n (s) — fn(t)\ < e. 

The functions f n are equally continuous. The S depends on e but it does not depend 
on n. Roughly speaking, the graphs of all the f n are similar. For total clarity, the 
concept might better be labeled uniform equicontinuity, in contrast to pointwise 
equicontinuity, which requires 


Ve > 0 and \/x G [a, b] 3 5 > 0 such that 


x — t\ < S and n G N => \fn( x ) ~ fnif ) I < e - 


The definitions work equally well for sets of functions, not only sequences of functions. 
The set £ C (7° is equicontinuous if 

Ve > 0 3 5 > 0 such that 
s — t\ <5 and / G £ |/(s) — f(t)\ < e. 

The crucial point is that 5 does not depend on the particular / G £. It is valid for all 
/ G £ simultaneously. To picture equicontinuity of a family £, imagine the graphs. 
Their shapes are uniformly controlled. Note that any finite number of continuous 
functions [a, b] — > R forms an equicontinuous family so Figures 92 and 93 are only 
suggestive. 

The basic theorem about equicontinuity is the 

14 Arzela-Ascoli Theorem Every bounded equicontinuous sequence of functions 
in C°([a, 6],R) has a uniformly convergent subsequence. 


Think of this as a compactness result. If (f n ) is the sequence of equicontinuous 
functions, the theorem amounts to asserting that the closure of the set {f n : n G N} 
is compact. Any compact metric space serves just as well as [a, 6], and the target 
space R can also be more general. See Section 8. 
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Figure 92 Equicontinuity 



Figure 93 Nonequicontinuity 
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15 Lemma If (fk) is a subsequence of (g n ) then for each k we have fk — g r for some 
r > k. 


Proof By definition of what a subsequence is, fk = g nk for some nk such that 
1 < n\ < U2 < • • • < n k - Hence r — Uk > k. □ 


Proof of the Arzela-Ascoli Theorem [a, b] has a countable dense subset D — 
{g?i, g?2 5 • • •}• For instance we could take D = Q D [a, 6]. Boundedness of (f n ) means 
that for some constant M, all x G [a, 6], and all n G N we have \f n (x)\ < M. Thus 
(fn(di)) is a bounded sequence of real numbers. Bolzano- Weierstrass implies that 
some subsequence of it converges to a limit in R, say 


fi,k(di) yi as k -G oo. 

The subsequence (/i,/c) evaluated at the point o?2 is also a bounded sequence in R, 
and there exists a sub-subsequence (/2,/c) such that converges to a limit in 

R, say /2,/c (^2) — > 1/2 as k — > 00. The sub-subsequence evaluated at di still converges 
to yi. Continuing in this way gives a nested family of subsequences (f m ,k) such that 

(fm,k) is a subsequence of (f m -i,k) 
j<m => fmfiidj) -G yj as k -G 00. 


Now consider the diagonal subsequence (y m ) = We claim that it converges 

uniformly to a limit, which will complete the proof. First we show it converges 
pointwise on D. Fix any j G N and look at m > j. Lemma 15 implies that / m?m = 
/m- i,ri f° r some ri > m. Applying the lemma again, we see that f m - i,n — fm-2,r 2 
for some ?~2 > ri > m. Repetition gives 


f 







for some r — r m _ 7 > • • • > 7*2 > r\ > m. Since r > m this gives 


9m(dj) fj,r(dj) ^j 


as rn 


00. 


We claim that g m {x) converges also at the other points x G [a, 5] and that the 
convergence is uniform. It suffices to show that (g m ) is a Cauchy sequence in £7°. 

Let e > 0 be given. Equicontinuity gives a S > 0 such that for all s,t G [a, b\ we 
have 

S t 5 | Quids') 9m(f)\ ^ X ' 
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Choose J large enough that every x G [a, b } lies in the ^-neighborhood of some dj 
with j < J. Since D is dense and [a, b\ is compact, this is possible. See Exercise 19. 
Since {di, . . . , dj} is a finite set and g m (dj) converges for each dj, there is an N such 
that for all £,m > N and all j < J, 

I dm {dj ) — gt(dj) < — . 

If f, mn > N and x G [a, 6], choose dj with |dj — x\ <8 and j < J . Then 

Idm(^) - gi{x) | < |dm(^) - dm(rfj)l + Iffm(dj) “ 

e e e 

^ — + — + — — 6 . 

_ 3 3 3 

Hence (g m ) is Cauchy in (7°, it converges in (7°, and the proof is complete. □ 


Part of the preceding development can be isolated as the 

16 Arzela-Ascoli Propagation Theorem Pointwise convergence of an equicon- 
tinuous sequence of functions on a dense subset of the domain propagates to uniform 
convergence on the whole domain. 


Proof This is the e/3 part of the proof. 


□ 


The example cited over and over again in the equicontinuity world is the following. 


17 Corollary Assume that f n : [a, b] -G R is a sequence of differentiable functions 
whose derivatives are uniformly bounded. If for one point xq, the sequence (f n ( x o)) i s 
bounded as n — > oo then the sequence (f n ) has a subsequence that converges uniformly 
on the whole interval [a, b\. 

Proof Let M be a bound for the derivatives |/^(x)|, valid for all n G N and all 
x G [a, b\. Equicontinuity of (f n ) follows from the Mean Value Theorem: 


s — 1 1 <5 


\fn(s) ~ fn(t)\ = f' n {9)\\s-t\<M8 


for some 6 between s and t. Thus, given e > o, the choice 6 = ej (M + 1) shows that 
(f n ) is equicontinuous. 

Let (7 be a bound for |/ n (#o)|, valid, for all n G N. Then 

\fn(x)\ < \fn(x) ~ fn(xo)\ + \fn(xo)\ < M\x - X 0 \ + C 

< M b — a + (7 


shows that the sequence (f n ) is bounded in (7°. The Arzela-Ascoli theorem then 
supplies the uniformly convergent subsequence. □ 
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Two other consequences of the same type are fundamental theorems in the fields 
of ordinary differential equations and complex variables. 

(a) A sequence of solutions to a continuous ordinary differential equation in R m 
has a subsequence that converges to a limit, and that limit is also a solution of 
the ODE. 

(b) A sequence of complex analytic functions that converges pointwise, converges 
uniformly (on compact subsets of the domain of definition) and the limit is 
complex analytic. 

Finally, we give a topological interpretation of the Arzela-Ascoli theorem. 

18 Heine-Borel Theorem in a Function Space A subset £ C £7° is compact if 
and only if it is closed, bounded, and equicontinuous. 


Proof Assume that £ is compact. By Theorem 2.65, it is closed and totally bounded. 
This means that given e > 0 there is a finite covering of £ by neighborhoods in £7° 
having radius e/3, say N € / 3 (/fc), with k — 1, . . . , n. Each /& is uniformly continuous 
so there is a S > 0 such that 


s — t\ < 8 =>- \fk(s)-fk(t)\< 


If / G £ then for some k we have / G N € / 3 (/fc), and 


s 


t 


< 5 implies 


1/00 -/Wl < I/O) - fk(s)\ + \fk(s) - fk(t)\ + \fk(t) - fit) 

e e e 

< — + — T — — e 

3 3 3 


Thus £ is equicontinuous. 


Conversely, assume that £ is closed, bounded, and equicontinuous. If (f n ) is a 
sequence in £ then by the Arzela-Ascoli theorem, some subsequence (f nk ) converges 
uniformly to a limit. The limit lies in £ since £ is closed. Thus £ is compact. □ 


4 Uniform Approximation in C° 

Given a continuous but nondifferentiable function /, we often want to make it 
smoother by a small perturbation. We want to approximate / in £7° by a smooth 
function g. The ultimately smooth function is a polynomial, and the first thing we 
prove is a polynomial approximation result. 


19 Weierstrass Approximation Theorem The set of polynomials is dense in 
£7°([a, 6], M). 
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Density means that for each / G (7° and each e > 0 there is a polynomial function 
p{x) such that for all x G [a, 6], 

| f(x) — p{x) | < e. 

There are several proofs of this theorem, and although they appear quite different 
from each other, they share a common thread: The approximating function is built 
from / by sampling the values of / and recombining them in some clever way. It is 
no loss of generality to assume that the interval [a, b] is [0, 1]. We do so. 

Proof 9^1 For each n G N, consider the sum 


ihi(x) = yy U) c k x k (i 

k — 0 


— X ) 


n—k 


where c & = f(k/n) and (^) is the binomial coefficient n\/k\(n — k)\. Clearly p n is a 
polynomial. It is called a Bernstein polynomial. We claim that the n th Bernstein 
polynomial converges uniformly to / as n -G oo. The proof relies on two formulas 
about how the functions 


T7 

' k( i \n—k 


r k{x) = L JX (1 - X) 

whose graphs are shown in Figure 94 behave. They are 


n 


( 2 ) 


(3) 


= 1 


/c=0 


n 


y^(£: — nx) 2 r/ c (x) = nx(l — x) 


/c=0 


In terms of the functions we write 


n 


n 


Pn(x ) = c fc r fc O) /(x) = yy f(x)r k (x). 


k = 0 


/c=0 


Then we divide the sum p n — f — — f)^k i n f° the terms where k/n is near x, 

and other terms where fc/n is far from x. More precisely, given e > 0 we use uniform 
continuity of / on [0, 1] to find 5 > 0 such that \t — s\ <5 implies | f(t) — f(s) \ < e/2. 
Then we set 


k 


n 


— x 


K\ — {k G {0, . . . , n} : 


< 5} and K 2 = {0, . . . , n} \ K\. 
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Figure 94 The seven basic Bernstein polynomials of degree 6, 

(k) x k {l — k — 0, . . . , 6 


This gives 


n 


\Pn{x)-f{x)\ < I c k - f(x)\r k (x) 


k = 0 


T l c fc - /eOlntOc) + Ic fc — /(a?) |r- fc (a7). 

/ceXi /ceie 2 


The factors |c& — /(x)| in the hrst sum are less than e/2 since = f(k/n) and k/n 
differs from x by less than 5. Since the sum of all the terms r & is 1 and the terms are 
nonnegative, the first sum is less than e/2. To estimate the second sum, use (3) to 
write 



x) — k — nx) 2 r^{x) > ( k 

k = 0 k^K2 

> ^ (nS) 2 r k (x), 

keK 2 
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since k G K 2 implies that (fc — nx ) 2 > (n5) 2 . This implies that 




< 


nx( 1 — x) 1 


< 




(n5) 2 4n5 2 


since maxx(l — x) — 1/4 as x varies in [0, 1]. The factors |c& — /(x)| in the second 
sum are at most 2 M where M — ||/||. Thus the second sum is 


M 


51 \ck~f(x)\r k (x) < ^ ^ 2 


keK 2 


when n is large, completing the proof that \p n (%) ~ /(x) | < e when n is large. 


It remains to check the identities (2) and (3). The binomial coefficients satisfy 


(4) (x + y) n = 51 (^) xk y n t 

k=0 

which becomes (2) if we set y — 1 — x. On the other hand, if we fix y and differentiate 
(4) with respect to x once, and then again, we get 


( 5 ) 

( 6 ) 


n(x + y) n 1 


n(n — l)(x + y) n 2 


— 1 )x k ~ 2 y n ~ k 

k=0 


Note that the bottom term in (5) and the bottom two terms in (6) are 0. Multiplying 
(5) by x and (6) by x 2 and then setting y = 1 — x in both equations gives 


( 7 ) 


( 8 ) 


nx — 


n / \ n 

5!(?W( i - x) n ~ k = 51 

fc=0 ' ' k = 0 

n 


n / \ n 

n(n — l)x 2 — f j k{k — l)x k (l — x) n ~ k — k{k — l)r/ c (x) 

k = o ' ' fc=0 


The last sum is J]/c 2 r/ c (x) — ^/cr&(x). Hence (7) and (8) become 


n n 

yy fc 2 r/ c (x) = n(n — l)x 2 + krj e (x) = n(n — l)x 2 + nx. 

/c=0 /c=0 


(9) 
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Using (2), (7), and (9) we get 

n 

“ nx) 2 r k (x) 

k= 0 

n n n 

= ^ k 2 r k (x) - 2 nx ^ kr k (x ) + (nx) 2 ^ r fc (x) 

/c=0 k= 0 /c=0 

= n(n — l)x 2 -\- nx — 2 (nx ) 2 + (nx ) 2 
— — nx 2 + nx = nx(l — x), 


as claimed in (3). 


□ 


Proof #2 Let / G C°([0,1] , R) be given and let g(x) — /(x) — (mx + b ) where 


/(l)-/(0) 


and b — /( 0). 


Then g G C° and g(0) = 0 = g(l). If we can approximate g arbitrarily well by 
polynomials, then the same is true of / since mx + b is a polynomial. In other words 
it is no loss of generality to assume that /(0) = /(l) = 0 in the first place. We do 
so. Also, we extend / to all of R by defining /(x) = 0 for all x G R \ [0, 1]. Then we 
consider a function 

Pn(t) = b n (l - t 2 ) n - 1 < t < 1, 

where the constant b n is chosen so that f3 n (t ) dt — 1. As shown in Figure 95, /3 n 
is a kind of polynomial bump function. For 0 < x < 1, set 

P n (x) = J f(x + t)(3 n (t) dt. 


This is a weighted average of the values of / using the weight function /3 n . We claim 
that P n is a polynomial and P n (x) =4 /(x) as n — > oo. 

To check that P n is a polynomial we use a change of variables, u — x + t. Then, 
for 0 < u < 1 we have 

rx+l pi 

P n (x) = / f(u)/3 n (u — x)du— / / (u) /3 n (u — x) du 

Jx - 1 4o 

since f — 0 outside [0, 1]. The function (3 n (u — x ) = 6 n (l — (ix — x) 2 ) n is a polynomial in 
x whose coefficients are polynomials in u. The powers of x pull out past the integral 
and we are left with these powers of x multiplied by numbers, namely, the integrals 
of the polynomials in u times f{u). In other words, by merely inspecting the last 
formula, it becomes clear that P n (x) is a polynomial in x. 
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Figure 95 The graph of the function — 1.467(1 — £ 2 ) 6 


To check that P n =4 / as n — > oo, we need to estimate /3 n (t ). We claim that if 
S > 0 then 


( 10 ) 


/3 n (t) =4 0 as n — > oo and 5 < \t\ < 1. 


This is “clear” from Figure 95. Proceeding more rigorously and using the definition 
of /3 n as /3 n (t) — b n ( 1 — t 2 ) n , we have 


•l 

1 = I fin(t)dt > 


'l/Vn o 1 

&n (1 t 2 ) n dt > 6 n -=(l ) n . 


n n 


Since 1/e = lim (1 — l/n) n , we see that for some constant c and all n. 


n— ^oo 


b n < c\/n. 


See also Exercise 31. Hence, if 5 < t <1 then 


/ 3 n (t ) — b n ( 1 — t 2 ) n < cv / n(l — — >> 0 as n — > oo 


2\n 


due to the fact that yTi tends to oo more slowly than (1 — 5 2 ) n as n ^ oo. This 
proves (10). 
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From (10) we deduce that P n =4 / as follows. Let e > 0 be given. Uniform 
continuity of / gives S > 0 such that \t\ < 5 implies | f{x + t) — f{x) \ < e/2. Since /3 n 
has integral 1 on [—1,1] we have 


l 


-l 


Pn{x) ~ f(x) 

1 

< / \f(x + t) ~ f(x)\P n (t)dt 


(. f(x + t ) - f(x))/3 n (t ) dt 


|£|<(5 


f{x + t) - f(x)\/3 n (t)dt + [ \f(x + t) - f(x)\j3 n (t)dt. 

J \t\>5 


The first integral is less than e/2, while the second is at most 2 M Jj t |><5 @ n (t) dt. By 
(10), the second integral is less than e/2 when n is large. Thus P n =4 / as claimed. □ 


Next we see how to extend this result to functions defined on a compact metric 
space M instead of merely on an interval. A subset A of C°M — C°(M, R) is a 
function algebra if it is closed under addition, scalar multiplication, and function 
multiplication. That is, if /, g E A and c is a constant then / + g, cf, and / • g belong 
to A. For example, the set of polynomials is a function algebra. The function algebra 
vanishes at a point p if f{p) — 0 for all / G A. For example, the function algebra 
of all polynomials with zero constant term vanishes at x — 0. The function algebra 
separates points if for each pair of distinct points pi,P2 €= M there is a function 
/ G A such that f(pi) ^ f(p2)- For example, the function algebra of all trigonometric 
polynomials separates points of [0, 2n) and vanishes nowhere. 


20 Stone- Weierstrass Theorem If M is a compact metric space and A is a func- 
tion algebra in C°M that vanishes nowhere and separates points then A is dense in 
C°M. 


Although the Weierstrass Approximation Theorem is a special case of the Stone- 
Weierstrass Theorem, the proof of the latter does not stand on its own; it depends 
crucially on the former. We also need two lemmas. 

21 Lemma If A vanishes nowhere and separates points then there exists f G A with 
specified values at any pair of distinct points. 

Proof Given distinct points pi,P2 £ Af, and given constants ci, C 2 , we seek a function 
/ G A such that f(pi) = c\ and f(p 2 ) — ^2- 
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Since A vanishes nowhere there exist gi- ) g2 £ A such that gi(pi) ^ 0 ^ 52 fez)- 
Then g = g\ + g\ belongs to A and vanishes at neither p\ nor p2. Since A separates 
points there exists h G A with different values at p\ and p2- Consider the matrix 


a ab 


9 (pi) 

g(pi)h(pi) 

c cd 


g(p2) 

g(p 2 )h(p 2 ) 


By construction a, c ^ 0 and b ^ d. Hence det H — acd — abc — ac(d — b) ^ 0, H has 
rank 2, and the linear equations 


at; + abg — c\ 
ci + cdr\ — C2 

have a solution (£, 77 ). Then / = £g + r\gh belongs to A and f(pi) — c 1, f(p2) = C 2 -D 


22 Lemma T/ie closure of a function algebra in C°M is a function algebra. 
Proof Clear enough. 


□ 


Proof of the Stone- Weierstrass Theorem Let d be a function algebra in C°M 
that vanishes nowhere and separates points. We must show that A is dense in C°M. 
Given F G C°M and e > 0, we must find G G A such that for all x G M we have 


(ii) 


F(x) — e < G(x) < F(x) + e . 


First we observe that 


(12) 


/ £ A 


I/I £ A 


where A denotes the closure of A in C°M. Let e > 0 be given. According to the 
Weierstrass Approximation Theorem, there exists a polynomial p(y) such that 


(13) 


sup{|p(y) - |y 1 1 : |y| < ||/||} < 


After all, \y\ is a continuous function defined on the interval [— 1|/||, ||/||]. The constant 
term of p(y) is at most e/2 since |p(0) — |0|| < e/2. Let q(y) = p(y) —p( 0). Then q(y) 
is a polynomial with zero constant term and (13) becomes 


(14) 


\q(y) - \y II < e 


for all y G [— ||/|| , ||/||]. Write q{y) = a\y + a2y 2 4 h a n y n and 


g — a if + a 2 f 2 + • • • + 
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(Here, f n denotes /•/•••/.) Lemma 22 states that A is an algebra, so g G A) 
Besides, if x G M and y — f(x) then 


g(x) - | f(x) 


q(y) -\y\\ < e - 


Hence |/| G A — A as claimed in (12). 

Next we observe that if /, g belong to A, then ma x(/, g ) and min (/, p) also belong 
to A. For 


ma x(/, 5) 
min(/, 5 ) 


f + 9 \f ~9 
2 2 

/ + g _ 1 / -g 

2 2 


Repetition shows that the maximum and minimum of any finite number of functions 
in A also belongs to A. 

Now we return to (11). Let F G C°M and e > 0 be given. We are trying to find 
G G A whose graph lies in the e-tube around the graph of F. Fix any distinct points 
p, q G M . According to Lemma 21, we can find a function in A with specified values 
at p, g, so there exists H pq G A that satisfies 


H pq (p) = F(p ) and = F(q). 


Fix p and let q vary. Each q € M has a neighborhood U q such that 


(15) x € U q => F(x) 6 < Hpq(x). 

For H pq (x) — F{x) + e is a continuous function of x which is positive at x — q. See 
Figure 96. 

Compactness of M implies that finitely many of these neighborhoods U q cover 
M, say U qi , . . . , U qn . Define 

G p (x) = ma x(H pqi (x), . . .,H pqn (x)). 

Then G p G A and, as shown in Figure 97, for all x G M we have 



G p (p) — F(p) and F{x) — e < G p (x). 


Continuity implies that each p has a neighborhood V p such that 


(17) 


x G V p G p (x) < F{x) + e. 
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Figure 96 For all x in a neighborhood of q we have H pq (x ) > F(x) — e. 



Figure 97 G p is the maximum of H prp , i = 1, . . . , n. 
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Figure 98 G p (p) — F(p ) and G p > F — e everywhere. 


See Figure 98. 

By compactness, finitely many of these neighborhoods cover M, say V Pl , . . . , V Prn . 
Set 


G(x) = min(G Pl (x), . . ,,G Pm (x)). 

We know that G G A and (16), (17) imply (11). See Figure 99. □ 


23 Corollary Any 27r-periodic continuous function of x E M can be uniformly ap- 
proximated by a trigonometric polynomial 

n n 

T(x) — ap + ar cos kx + b ^ sin kx. 

k = 1 k = 1 

Proof Think of [0, 27 t) parameterizing the circle S 1 by x i— > (cos x, sin x). The circle 
is compact, and 27r-periodic continuous functions on M become continuous functions 
on S 1 . The trigonometric polynomials on S 1 form an algebra 7 C C 0 ^ 1 that vanishes 
nowhere and separates points. The Stone- Weierstrass Theorem implies that T is dense 
in C 0 # 1 . □ 

^ Since a function algebra need not contain constant functions, it was important that q has no 
constant term. One should not expect that g = ao + a\f + • • • + a n f n belongs to A. 
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Figure 99 The graph of G lies in the e-tube around the graph of F. 


Here is a typical application of the Stone- Weierstrass Theorem: Consider a con- 
tinuous vector field F : A — >> R 2 where A is the closed unit disc in the plane, and 
suppose that we want to approximate F by a vector field that vanishes (equals zero) 
at most finitely often. A simple way to do so is to approximate F by a polynomial 
vector held G. Real polynomials in two variables are finite sums 

n 

P(x, y) = CijX l y J 
hj = 0 

where the Cij are constants. They form a function algebra A in C°(A,R) that sep- 
arates points and vanishes nowhere. By the Stone- Weierstrass Theorem, A is dense 
in c°, so we can approximate the components of F — (F\, F 2 ) by polynomials 

F 1 = P F 2 = Q. 

(The symbol = indicates “almost equal.”) The vector held (P, Q) then approximates 
F. Changing the coefficients of P by a small amount ensures that P and Q have no 
common polynomial factor and F vanishes at most hnitely often. 
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5 Contractions and ODEs 

Fixed-point theorems are of great use in the applications of analysis, including the 
basic theory of vector calculus such as the general implicit function theorem. If 
/ : M M and for some p G M we have f{p) — p then p is a fixed-point of /. 
When must / have a fixed-point? This question has many answers, and the two most 
famous are given in the next two theorems. 

Let M be a metric space. A contraction of M is a mapping / : M — > M such 
that for some constant k < 1 and all x, y G M we have 

d{fx,fy ) < kd(x, y). 

24 Banach Contraction Principle Suppose that f : M -G M is a contraction and 
the metric space M is complete. Then f has a unique fixed-point p and for any 
x G M , the iterate ^ f n (x ) = / o / o • • • o f(x) converges to p as n -G oo. 

Brouwer Fixed-Point Theorem Suppose that f : B m -G F> m zs continuous where 
B m is the closed unit ball in R m . Then f has a fixed-point p G F> m . 

The proof of the first result is fairly easy, the second not. See Figure 100 to picture 
a contraction and Section 10 of Chapter 5 for a proof of the Brouwer theorem. 

Proof fifl of the Banach Contraction Principle Beautiful, simple, and dynam- 
ical! See Figure 100. Choose any xq G M and define x n — f n (x o). We claim that for 
all n G N we have 

(18) d(x n ,x n+ i) < k n d(x 0 ,xi). 

This is easy: 

d(x n ,x n+ 1 ) = d(f(x n - i),f(x n )) < kd(x n -i,x n ) < k 2 d(x n - 2 ,^- 1 ) 

< ... < k n d(x 0 , x\). 

From this and a geometric series type of estimate, it follows that the sequence (x n ) 
is Cauchy. For let e > 0 be given. Choose N large enough that 

k N 

(19) -d(xo,x i) < e. 

1 — k 

^Note the abuse of notation. In the proof of the Stone- Weierstrass Theorem, f n (x) denotes the 
n th power of the real number /(#), while here f n denotes the composition of / with itself n times. 
Deal with it! 
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Figure 100 / contracts M toward the fixed-point p. 


Note that (19) needs the hypothesis k < 1. If IV < m < n then (18) gives 



< 


dipCjyi^ x m + i) T d(x m _)_i, Xjyi-^ 2 ) ~\~ • • • 4“ d{x n — i, x n ) 
k m d(x o, xi) + fc m+1 d(xo, xi) + . . . + fc n-1 d(xo, xi) 
fc m (l + fc + . . . + fc n_m_1 )d(xo, xi) 

< e. 


(X) 


k 


N 


k N k^d(x o, xi) = -d(x o, xi) 

1 — k 


£=0 


Thus (x n ) is Cauchy. Since M is complete, x n converges to some p G M as n oo. 
Let e > 0 be given. For large n, the points x n and x n+ i he in the e-neighborhood 
of p. Since f(x n ) — x n +i, the map / moves x n a distance < 2e, and since e is 
arbitrarily small, continuity of / implies / does not move p at all. It is a fixed-point 
of /. Uniqueness of the fixed-point is immediate. After all, how can two points 
simultaneously stay fixed and move closer together? □ 


Proof 9^2 of the Banach Contraction Principle Choose any point xo G M and 

choose ro so large that f(M ro (x o)) C M ro (x o). Let Bq — M ro (x o) and B n — 
f n (B n _ i). The diameter of B n is at most k n diam(L>o), and this tends to 0 as n ^ oo. 
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The sets B n nest downward as n oo and / sends B n inside B n + Since M is com- 
plete, this implies that f \B n is a single point, say p, and /(p) — p. □ 


Proof of Brouwer’s Theorem in Dimension One The closed unit 1-ball is the 

interval [— 1 , 1 ] in R. If / : [— 1 , 1 ] — > [— 1 , 1 ] is continuous then so is g{pc) — x — f{x). 
At the endpoints ± 1 , we have g( — 1 ) < 0 < g( 1 ). By the Intermediate Value Theorem, 
there is a point p G [— 1 , 1 ] such that g(p) — 0. That is, /(p) — p. □ 


The proof in higher dimensions is harder. One proof is a consequence of the 
general Stokes’ Theorem, and is given in Chapter 5. Another depends on algebraic 
topology, a third on differential topology. 


Ordinary Differential Equations 

The qualitative theory of ordinary differential equations (ODEs) begins with the 
basic existence/uniqueness theorem, Picard’s Theorem. Throughout, U is an open 
subset of m-dimensional Euclidean space R m . 

A vector ODE on U is given as m simultaneous scalar equations 

X \ fl(x 1 , 3 ^ 2 } • • • 5 Xffi) 

X <2 ^ 2(^15 X 2 -) • • • 5 Xqji) 

• • • 

Xjn fm(x 1 , X2t • • • 5 


where each fi is a function from U to R. One seeks m real- valued functions x \ (£),..., 
x m (t) such that 


dx\(t) 

dt 

dx2(t) 

dt 


fi(xi(t),x 2 (t) , . . . , x rn (t)) 


f2(xi(t),X 2 (t) , . . . , x m (t)) 


= fm(xi(t),X 2 (t), . . .,x m (t)) 
dt 

hold identically and simultaneously. The functions x\ (t), . . . , x m {t) are said to solve 
the ODE with initial condition 


(xi(0),x 2 (0) , . . . , x m (o)). 
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The ODE can be expressed geometrically as follows. The m real-valued functions 
fi can be combined into a vector function F{x) — (/i(x), . . . , / m (x)) where x — 
(xi, . . . , x m ). Thus F is a vector field on [/, and we seek a trajectory of F, that 
is, a curve 7 : (a, b) — > [/ such that a < 0 < b and for all t G (a, 6) we have 

(20) 7 r (t) = F(y(t)) and 7(0) = p. 

The components of 7 are the functions Xi(t) that solve the ODE and p is their initial 
condition. I contend that this geometric view of an ODE as a vector field is the best 
way to get intuition about it. See Figure 101. 



We think of the vector field F defining at each x G U a vector F(x) whose foot 

lies at x and to which 7 must be tangent. The vector l r {t) is (7i it) t (t)) where 

7i,...,7 m are the components of 7- The trajectory 7 (t) describes how a particle 
travels with prescribed velocity F. At each time £, 7 (t) is the position of the particle; 
its velocity there is exactly the vector F at that point. Intuitively, trajectories should 
exist because particles do move. 

The contraction principle gives a way to find trajectories of vector fields, or what 
is the same thing, to solve ODEs. We will assume that F satisfies a Lipschitz 
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condition 


there is a constant L such that for all points x,y G U we have 


I F(x) - F(y ) | < L\x - y 


Here, | | refers to the Euclidean length of a vector. F, x, y are all vectors in R m . It 

follows that F is continuous. The Lipschitz condition is stronger than continuity, but 
still fairly mild. Any differentiable vector field with a bounded derivative is Lipschitz. 


25 Picard’s Theorem Given p G U there exists an F -trajectory 7 (t) in U through 
p. This means that 7 : (a, b) — > U solves (20). Locally, 7 is unique. 


To prove Picard’s Theorem it is convenient to reexpress (20) as an integral equa- 
tion; to do this we make a brief digression about vector-valued integrals. Let’s recall 
four key facts about integrals of real- valued functions of a real variable, y — /(x), 
a < x < b. 


(a) 

(b) 

(c) 

(d) 


f a f(x) dx is approximated by Riemann sums R = ^ /(£&) Ax^ 
Continuous functions are integrable. 

If f (x) exists and is continuous then f h f (x) dx = f(b ) - f(a). 


/a f( x ) d x — M(b — a) where M — sup \f(x) 


The Riemann sum R in (a) has a — xq < • • • < x^-i < tk < x^ < • • • < x n — b and 
all the Axk — x^ — x^-i are small. 


Given a continuous vector- valued function of a real variable 


f( x ) = ( fl( x ), ■ ■ -,fm( x )), 

a < x < b, we define its integral componentwise as the vector of integrals 

/ f(x)dx— I / fi(x)dx, ... ,/ f m (x)dx j. 

J a \J a J a J 


Corresponding to (a) - (d) are the following: 


(a 7 ) f b f(x) dx is approximated by R — (i?i, . . . , i? m ), with Rj a Riemann sum for 

Sr 

(h r ) Continuous vector-valued functions are integrable. 

V) if r (x) exists and is continuous, then J b f (x)dx = f(b ) - f(a). 

(d ; ) J b f(x) dx < M(b — a) where M — sup | f(x) 
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(a 7 ), (b 7 ), and (c 7 ) are clear enough. To check (d 7 ) we write 


R = E R r e i = E J2 ( f k) Ax k e j 

j k 

fj(t k )ejAx k = ^2f(t k ) Ax k 



k j 


k 


where ei, . . . , e m is the standard vector basis for R m . Thus. 


R\ <^2\f(t k )\ Ax k <J2 MAxk — M(b — a). 


k 


k 


By (a 7 ), R approximates the integral, which implies (d 7 ). (Note that a weaker inequal- 
ity with M replaced by y/mM follows immediately from (d). This weaker inequality 
would suffice for most of what we do but it is inelegant.) 

Now consider the following integral version of (20), 


( 21 ) 


7 (t) = p+ [ F( r y(s))ds, 

J o 


A solution of (21) is by definition any continuous curve 7 : (a, b) — > U for which 
(21) holds identically in f E (a, b). By (b 7 ) any solution of (21) is automatically 
differentiable and its derivative is F(y(t)). That is, every solution of (21) solves (20). 
The converse is also clear, so solving (20) is equivalent to solving (21) for a continuous 
function 7 (£). 

Proof of Picard’s Theorem Since F is continuous, there exist a compact neigh- 
borhood N — N r (p ) C U and a constant M such that \F(x)\ < M for all x G N. 
Choose r > 0 such that 


(22) 


rM < r and rL < 1. 


— T, T 


Consider the set S of all continuous functions 7, cr : 
the metric 

d{ 7, a) = sup{ |7(t) — cr(t) \ : t E [— r, r] } 


IV. With respect to 


the set S is a complete metric space. Given 7 G S, define a new curve 4>(7) as 


$(7 )(t)=p+ [ F(^(s))ds. 

J o 


Solving (21) is the same as hnding 7 such that 4>(7) = 7- That is, we seek a fixed 
point of 4>. 
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We just need to show that 4> is a contraction of 6. Does 4> send 6 into itself? 
Given 7 G 6 we see that <f>(7 )(£) is a continuous (in fact differentiable) vector- valued 
function of t and that by (22), 


$(7)(t) —p 


F(7(s)) ds 


o 


< tM < r. 


Therefore, 4> does send S into itself. 4> contracts 6 because 


d($(7),$(<7)) = 


sup 

t 


F(l(s)) — F(a(s)) ds 


o 


< 

< 


rsup |F(7(s)) — F(a(s)) 

s 

r sup L|7 (s) — <j(s) | < rLd(7,cr) 


and tL < 1 by (22). Therefore 4> has a fixed-point 7, and 4>(7) = 7 implies that 7 (t) 
solves (21), which implies that 7 is differentiable and solves (20). 


Any other solution a(t) of (20) defined on the interval [— r, r\ also solves (21) and 
is a fixed-point of 4>, 4>(cr) = cr. Since a contraction mapping has a unique fixed-point, 
7 = <r, which is what local uniqueness means. □ 


The F-trajectories define a flow in the following way: To avoid the possibility that 
trajectories cross the boundary of U (they “escape from [/”) or become unbounded in 
finite time (they “escape to infinity”) we assume that U is all of R m . Then trajectories 
can be defined for all time t G R. Let 7 (t,p) denote the trajectory through p. Imagine 
all points p G W n moving in unison along their trajectories as t increases. They are 
leaves on a river, motes in a breeze. The point p\ — 7(C,p) at which p arrives after 
time t\ moves according to 7(£,pi). Before p arrives at pi, however, p\ has already 
gone elsewhere. This is expressed by the flow equation 

= i{t + h,p). 


See Figure 102. 

The flow equation is true because as functions of t both sides of the equation are 
F-trajectories through pi, and the F-trajectory through a point is locally unique. It is 
revealing to rewrite the flow equation with different notation. Setting pt(p) = 7 (£,p) 
gives 

(fit+sip) = <Pt(<Ps(p)) for all t,s € M. 

Pt is called the Fadvance map. It specifies where each point moves after time t. 
See Figure 103. The flow equation states that t pt is a group homomorphism from 
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S Pi 
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/ p 
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P\ 


/ 


/ 






Figure 102 The time needed to flow from from p to p 2 is the sum of the 
times needed to flow from p to p\ and from p\ to £>2- 



Figure 103 The t-advance map shows how a set A flows to a set (p t (A). 
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R into the group of motions of R m . In fact each ft is a homeomorphism of R m onto 
itself and its inverse is f-t. For f_ t o ft = fo and fo is the time-zero map where 
nothing moves at all, fo — identity map. 


6* Analytic Functions 

Recall from Chapter 3 that a function / : (a, b) -A R is analytic if it can be expressed 
locally as a power series. For each x E (a, b ) there exists a convergent power series 
Ckh k such that for all x + h near x we have 


oo 


f(x + h) = ^ c kh k ■ 


k = 0 


As we have shown previously, every analytic function is smooth but not every smooth 
function is analytic. In this section we give a necessary and sufficient condition that 
a smooth function be analytic. It involves the speed with which the r th derivative 
grows as r -A oo. 


Let / : (a, b) R be smooth. The Taylor series for / at x E (a, b) is 




k = 0 


k\ 


Let I — x (j, x a j be a submterval of (A, 6), a > o, and denote by M r the 

maximum of |/^(£)| for t E I. The derivative growth rate of / on I is 


a — limsup 


r ! Air 


r— ^ oo V r\ 


Clearly, {/|/C)(x)|/r! < \J M r /r\ , so the radius of convergence 


R = 


lim sup 

oo 



of the Taylor series at x satishes 


< R. 


a 


In particular, if a is finite the radius of convergence of the Taylor series is positive. 
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26 Theorem If aa < 1 then the Taylor series converges uniformly to f on the 
interval I . 


Proof Choose 5 > 0 such that (a + 5) a < 1. The Taylor remainder formula from 
Chapter 3, applied to the (r — l) st -order remainder, gives 


/(* + A) - E ^ A* - 


k = 0 


k\ 


r\ 


for some 6 between x and x + h. Thus, for r large we have 


f{x + h)~j2—^ h 


k = 0 


fc! 


< 


M 


r! 


r r 

a 


M, x 1//r Xr 


r! 


< ((a + S)a)' 


Since (a + <5)<r < 1, the Taylor series converges uniformly to f(x + h) on I. 


□ 


27 Theorem If f is expressed as a convergent power series f(x + h) — ^c^h k with 
radius of convergence R > a then f has bounded derivative growth rate on I . 


The proof of Theorem 27 uses two estimates about the growth rate of factorials. 
If you know Stirling’s formula they are easy, but we prove them directly. 


(23) 


(24) 


lim 

r— >oo 


fy* i 


r! 


= e 


0 < A < 1 


lim sup 

r— ^oo 


N 


(X) 


E 

k=r 


k~ 

r 


X k < 


OO. 


Taking logarithms, applying the integral test, and ignoring terms that tend to 
zero as r — y oo gives 


-(logr r — logr!) 
r 

1 f r 

~ log r / log x dx 

r J i 


logr (logr + log(r — 1) + • • • + log 1) 

r 

1 |r 
log r — - (x log x — x) 
r 


l 


1 

1 --, 

r 


which tends to 1 as r — > oo. This proves (23). 
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To prove (24) we write A = e M for fi > o, and reason similarly: 


(X) 


k 


k=r 


co 


Elf = E 


k(k — l)(k — 2) . . . (k — r + 1) 


k/i 


k—r 


r\ 


CO 


< — 

r! 


E* 


r e ~kn 


r\ 


k=r 

e 


i r°° 

~ - / x r e~» x dx 

r! 


— 1 ( x r rx r 1 r(r — \)x r 2 

U x I _|_ _|_ - _J_ 


/i 




< 
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According to (23) the r th root of this quantity tends to e 1 min(l, /i) as r -+ oo, 
completing the proof of (24). 

Proof of Theorem 27 By assumption the power series ^ c^h k has radius of con- 
vergence R and a < R. Since 1/R is the limsup of y/|c^| as k -+ oo, there is a 
number A < 1 such that for all large k we have |c^cr^ | < X k . Differentiating the series 
term by term with \h\ < a gives 


CO 


f( r \x + h)\ < k(k — l)(k — 2) . . . (k — r + l)\ckh 


k—r 


k=r 


' ~ a 


< -E 


C T 


k=r 


r 


Ckcr 


k 


r i 

< — 
“ cr r 


i EE /fc' 


E 

k=r 


A 


k 


r 


for r large. Thus. 


M r 


sup |/ (r) (x + 0| < CAZ 


According to (24) 


a — limsup 

r— co 


Mr 


r\ 


< — lim sup 

& r— )• co 


1 


CO 


E 

k=r 


k' 


r 


X k < 


OO. 


and / has bounded derivative growth rate on I. 


□ 


From Theorems 26 and 27 we deduce the main result of this section. 

28 Analyticity Theorem A smooth function is analytic if and only if it has locally 
bounded derivative growth rate. 
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Proof Assume that / : (a, b) -G M is smooth and has locally bounded derivative 
growth rate. Then x G (a, 5) has a neighborhood IV on which the derivative growth 
rate a is finite. Choose a > 0 such that I — [x — <r, x + a] C N and aa < 1. We 
infer from Theorem 26 that the Taylor series for / at x converges uniformly to / on 
7. Hence / is analytic. 


Conversely, assume that / is analytic and let x G (a, b ) be given. There is a power 
series ^ c^h k that converges to f{x + h) for all h in some interval (—72, 7?) with R > 0. 
Choose a with 0 < a < 72. We infer from Theorem 27 that / has bounded derivative 
growth rate on /. □ 


29 Corollary A smooth function is analytic if its derivatives are uniformly bounded. 


An example of such a function is /(x) = sinx. 


Proof If |/( r) (<9) < M for all r and 8 then the derivative growth rate of / is bounded. 
In fact, o = 0 and R — oo. □ 


30 Taylor’s Theorem If f{x) — ff2 c k xk an( l the power series has radius of conver- 
gence R then f is analytic on (—R,R). 


Proof The function / is smooth, and by Theorem 27 it has bounded derivative 
growth rate on each compact interval I C (—7?, R ). Hence it is analytic. □ 


Taylor’s Theorem states that not only can / be expanded as a convergent power 
series at x = 0, but also at any other point xq G (—72,7?). Other proofs of Taylor’s 
theorem rely more heavily on series manipulations and Mertens’ theorem (Exercise 73 
in Chapter 3). 

The concept of analyticity extends immediately to complex functions. A function 
/ : D — > C is complex analytic if D is an open subset of C and for each z E D 
there is a power series 

T ( W k 

such that for all z + ( near z, 


f(z + o = FAV 

k = 0 

The coefficients c& are complex and so is the variable £. Convergence occurs on a 
disc of radius 72. This lets us define e z , log z, sin z, cosz for the complex number z 
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by setting 


e z — 


sm z 


E 

k = 0 

(X) 

= E 

k = 0 


(X) b 


k\ 

^ ^k z^k+l 

(2k + 1)! 


°°^ f_l\k+l z k 

log(l + z) = E — — when \z\ < 1 


cosz 


k= l 

OG 

= E 

/c=0 


k 

(~l) k z 2k 

(2 fcj! 


It is enlightening and reassuring to derive formulas such as 


= cos 8 + z sin 0 


directly from these dehnitions. (Just plug in z = and use the equations i 2 = 
— 1 , i 3 = — z,z 4 = 1, etc.) A key formula to check is e z+w — e z e w . One proof involves 
a manipulation of product series; a second merely uses analyticity. Another formula 
is log(e z ) — z. 

There are many natural results about real analytic functions that can be proved 
by direct power series means; e.g., the sum, product, reciprocal, composite, and 
inverse function of analytic functions are analytic. Direct proofs, like those for the 
Analyticity Theorem above, involve major series manipulations. The use of complex 
variables leads to greatly simplified proofs of these real variable theorems, thanks to 
the following fact. 


Real analyticity propagates to complex analyticity and 
complex analyticity is equivalent to complex differentiability 


For it is relatively easy to check that the composition, etc., of complex differentiable 
functions is complex differentiable. 

The analyticity concept extends even beyond C. You may already have seen such 
an extension when you studied the vector linear ODE 

/t nr> 

T T 


in calculus. A is a given m x m matrix and the unknown solution x — x(t) is a 
vector function of t, on which an initial condition x(0) = xq is usually imposed. A 


function f : D C is complex differentiable or holomorphic if D is an open subset of C 
and for each z £ T, the limit of 


A/ = f(z + As) - f{z) 

Az Az 

exists as Az —>■ 0 in C. The limit, if it exists, is a complex number. 
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vector ODE is equivalent to m coupled, scalar, linear ODEs. The solution x(t) can 
be expressed as 

x(t) — e tA xo 

where 

i i 00 f k 

e‘ A = lim (I + tA+ -(tAf + • • • + -(a)") = J / . 

n^oo 2! n\ z ' k\ 

k = 0 

I is the m x m identity matrix. View this series as a power series with k th coefficient 
t k /k\ and variable A. ( A is a matrix variable!) The limit exists in the space of all 
mxm matrices, and its product with the constant vector xq does indeed give a vector 
function of t that solves the original linear ODE. 

The previous series defines the exponential of a matrix as e A — ^ A k /k \ . You 
might ask yourself - is there such a thing as the logarithm of a matrix? A function that 
assigns to a matrix its matrix logarithm? A power series that expresses the matrix 
logarithm? What about other analytic functions? Is there such a thing as the sine 
of a matrix? What about inverting a matrix? Is there a power series that expresses 
matrix inversion? Are formulas such as log A 2 = 2 log A true? These questions are 
explored in nonlinear functional analysis. 

A terminological point on which to insist is that the word “analytic” be defined as 
“locally power series expressible.” In the complex case, some mathematicians define 
complex analyticity as complex differentiability, and although complex differentiabil- 
ity turns out to be equivalent to local expressibility as a complex power series, this is 
a very special feature of C. In fact it is responsible for every distinction between real 
and complex analysis. For cross-theory consistency, then, one should use the word 
“analytic” to mean local power series expressible, and use “differentiable” to mean 
differentiable. Why confound the two ideas? 


7* Nowhere Differentiable Continuous Functions 


Although many continuous functions, such as |x|, yfx, and xsin(l/x) fail to be dif- 
ferentiable at a few points, it is quite surprising that there can exist a function which 
is everywhere continuous but nowhere differentiable. 


31 Theorem There exists a continuous function f : R — > R that has a derivative at 
no point whatsoever. 
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Proof The construction is due to Weierstrass. The letters denote integers. 

Start with a sawtooth function ctq : M — > R dehned as 


CT 0 (x) = | 

(Jo is periodic with period 
sawtooth function 


has period 7T& = 2/4 k . If t 


x — 2 n if 2n < x < 2n + 1 

2n + 2 — x if 2n + 1 < x < 2n + 2. 

2; if £ = x + 2m then <Jo(t) = ctq ( x ). The compressed 


k 


a k(x) = ( | ) cr 0 (4 fc a:) 


= x + mirk then ) = <J/c(x). See Figure 104. 







Figure 104 The graphs of the sawtooth function and two compressed 

sawtooth functions 
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According to the M- test, the series &k{ x ) converges uniformly to a limit /, and 


(X) 


f( x ) = ^2 a k(x) 


k = 0 

is continuous. We claim that / is nowhere differentiable. Fix an arbitrary point x, 
and set S n — 1/2 • 4 n . We will show that 

A/ = f(x ± S n ) - f{x) 

Ax 5 n 

) n 0, and thus that f'(x) does not exist. The 

(X) 


does not converge to a limit as 5 . 
quotient is 

A/ ^ a k (x ± S n ) - o k (x) 


Ax 


= £ 

k = 0 


<5. 


n 


There are three types of terms in the series, k > n, k — n, and k < n. If k > n then 
Uk{x ± S n ) — a k(x) — 0. For S n is an integer multiple of the period of Cj,. 


Sn = 


= 4^-( n +!) . _ = 4 fe-(n+l) . nk 


2 • 4 n 4 k 

Thus the infinite series expression for A // Ax reduces to a sum of n + 1 terms 


A/ 

Ax 


n—1 




s. 


k = 0 


Sk 


The function a n is monotone on either [x — 5 n , x] or [x . x + S n ] . since it is monotone 
on intervals of length 4~ n and the contiguous interval [x — S n . x. x + d ri ] at x is of 
length 4~ n . The slope of a n is ±3 n . Thus, either 


&n(x ~h Sn) ^n(x) 

f— 1 

o 

g 

CO 

<J n (x - S n ) - a n (x) 

Sn 

Sn 



= 3 n . 

The terms with k < n are crudely estimated from the slope of ak being ±3 '' 

a k (x ± S k ) - a k (x) 


Sk 


< 3 


k 


Thus 


A/ 

Ax 


>3 n - (3 n_1 + 


on _ | i 

+ 1) = 3 n = -(3 n + 1) 

’ 3-1 2 V ’ 


which tends to oo as S n > 0, so f ( x ) does not exist. 


□ 
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By simply writing down a sawtooth series as above, Weierstrass showed that 
there exists a nowhere differentiable continuous function. Yet more amazing is the 
fact that most continuous functions (in a reasonable sense defined below) are nowhere 
differentiable. If you could pick a continuous function at random, it would be nowhere 
differentiable. 

Recall that the set D C Mis dense in M if D meets every nonempty open subset 
W of M, D nW 0. The intersection of two dense sets need not be dense; it can be 
empty, as is the case with Q and Q c in R. On the other hand if [/, V are open-dense 
sets in M then U n V is open-dense in M . For if W is any nonempty open subset of 
M then U HW is a nonempty open subset of M, and by denseness of V, we see that 
V meets U H W; i.e., U H V H W is nonempty and U PiV meets W. 

Moral Open dense sets do a good job of being dense. 

The countable intersection G — H G n of open-dense sets is called a thick (or 
residual^) subset of M, due to the following result, which we will apply in the 
complete metric space C°([a, 6], R). Extending our vocabulary in a natural way we 
say that the complement of a thick set is thin (or meager). A subset H of M is thin 
if and only if it is a countable union of nowhere dense closed sets, H — \J H n . Clearly, 
thickness and thinness are topological properties. A thin set is the topological analog 
of a zero set (a set whose outer measure is zero). 

32 Baire’s Theorem Every thick subset of a complete metric space M is dense in 
M . A nonempty, complete metric space is not thin. That is, if M is the union of 
countably many closed sets then at least one has nonempty interior. 

If all points in a thick subset of M satisfy some condition then the condition is 
said to be generic. We also say that “most” points of M obey the condition. As a 
consequence of Baire’s theorem and the Weierstrass Approximation Theorem we will 
prove 

33 Theorem The generic f G C° = C°([a, 6], R) is differentiable at no point of [a, 6], 
nor does it even have a left or right derivative at any x G [a, b\, nor is it monotone 
on any subinterval of [a, b\. 

Using Lebesgue’s monotone differentiation theorem from Chapter 6 (monotonicity 
implies differentiability almost everywhere), one can see that the second assertion 
follows from the first, but below we give a direct proof. 

^ “Residual” is an unfortunate choice of words. It connotes smallness, when it should connote just 
the opposite. 
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Before getting into the proofs of Baire’s theorem and Theorem 33, we further 
discuss thickness, thinness, and genericity. The empty set is always thin and the full 
space M is always thick in itself. A single open-dense subset is thick and a single 
closed nowhere dense subset is thin. R\ Z is a thick subset of R and the Cantor set is 
a thin subset of R. Likewise R is a thin subset of R 2 . The generic point of R does not 
he in the Cantor set. The generic point of R 2 does not he on the x-axis. Although 

R \ Z is a thick subset of R it is not a thick subset of R 2 . The set Q is a thin subset 

of R. It is the countable union of its points, each of which is a closed nowhere dense 
set. Q c is a thick subset of R. The generic real number is irrational. In the same 
vein: 

(a) The generic square matrix has determinant ^ 0. 

(b) The generic linear transformation R m R m is an isomorphism. 

(c) The generic linear transformation R m — > W n ~ k is onto. 

(d) The generic linear transformation R m — > R m+fc is one-to-one. 

(e) The generic pair of lines in R 3 are skew (nonparallel and disjoint). 

(f) The generic plane in R 3 meets the three coordinate axes in three distinct points. 

(g) The generic n th -degree polynomial has n distinct roots. 

In an incomplete metric space such as Q, thickness and thinness have no bite - 
every subset of Q, even the empty set, is thick in Q. 

Proof of Baire’s Theorem If M = 0, the proof is trivial, so we assume M ^ 0. 
Let G — n G n be a thick subset of M, each G n being open-dense in M . Let po <E M 
and e > 0 be given. Choose a sequence of points p n G M and radii r n > 0 such that 
r n < 1/n and 


M 2 ri (pi) C M e (p 0 ) 

M2r 2 (.P2) c M ri (pi ) n Gi 

• • • 

A /*2 r n (jPn) C M Tn _ 1 (p n -l) C G\ H * * * Cl G n — 

See Figure 105. Then 

M e (p 0 ) D M ri (p i) D M r 2 (p 2 ) D ... . 


The diameters of these closed sets tend to 0 as n — ^ oo. Thus (p n ) is a Cauchy 
sequence and it converges to some p G M by completeness. The point p belongs to 
each set M rn (p n ) and therefore it belongs to each G n . Thus p G G Cl M e (po) and G 
is dense in M . 
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Figure 105 The closed neighborhoods M rn (p n ) nest down to a point. 


To check that M is not thin, we take complements. Suppose that M — [ J K n and 
K n is closed. If each K n has empty interior then each G n — K f is open-dense and 

G = n Gn = (U K n ) c = 0, 

a contradiction to density of G. □ 


34 Corollary No subset of a complete nonempty metric space is both thick and thin. 

Proof If S is both a thick and thin subset of M then M \ S is also both thick and 
thin. The intersection of two thick subsets of M is thick, so 0 = S n (M \ S') is a 
thick subset of M . By Baire’s Theorem, this empty set is dense in M, so M is empty, 
contrary to the hypothesis. □ 


Proof of Theorem 33 For n G N define 

R n — {/ G C° : Mx G [a, b — l/n\ 3 h > 0 such that 

L n — {/ £ C° : Vx G [a + 1/n, b] 3 h < 0 such that 

f0 



> n} 


> n} 


Gn — {/ £ C u : / restricted to any interval of length 1 jn is nonmonotone} 
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where A / = f{x + h) — f(x). We claim that each of these sets is open-dense in C°. 

To check denseness it is enough to prove that the closures of P n , L ni and G n 
contain the set 7 of polynomials. For by the Weierstrass Approximation Theorem 7 
is dense in C°. (A set whose closure contains a dense set is dense itself.) 

Fix n, fix a P G 7, and let e > 0 be given. Consider a sawtooth function a which 
has period < 1/n, size < e, and 


min{ | slope x (a) | } > n + max{ | slope^ (P) | } 

rp rp 

*Ay t Xj 


Since the slopes of a are far greater than those of P, the slopes of / = P + cr alternate 
in sign with period < l/2n. At any x G [a, b— 1/n], / has a rightward slope of either 
n or — n. Thus / G R n . Likewise / G L n and / G G n , so the three sets are dense in 
C°. 

Next we prove R n is open. Let / G R n be given. For each x G [a, b — 1/n] there 
is an h — h(x ) > 0 such that 


f(x + h) - f(x) 
h 


> n. 


Since / is continuous, there is a neighborhood T x of x in [a, b] and a constant v 
is(x) > 0 such that this same h yields 


fjt + h) - fjt ) 

h 


> n + v 


for all t G T x . Since [a, b —1/n] is compact, finitely many of these neighborhoods P 
cover it, say T Xl , . . . ,T Xm . Continuity of / implies that for all t G T Xi we have 


(25) 


f(t + hj) - f(t) 
hi 


> n + v 


ll 


where hi — h(xi) and Vf — n(xi). These m inequalities for points t in the m sets T Xi 
remain nearly valid if / is replaced by a function g with d(/, g ) small enough. Then 
(25) becomes 


(26) 


g(t + hj) - g(t) 
hi 


> n . 


which means that g G R n and R n is open in C°. Similarly L n is open in C°. 


Checking that G n is open is easier. If (/&) is a sequence of functions in G c n and 
f k ^f then we must show that / G G c n . Each f k is monotone on some interval I k 
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of length 1/n. There is a subsequence of these intervals that converges to a limit 
interval I. Its length is 1/n and by uniform convergence, / is monotone on 1. Hence 
G \ is closed and G n is open, which completes the proof that each set R n ,L n ,G n is 
open-dense in ( 7 °. 

Finally, if / belongs to the thick set 


n 


71=1 


Rn n L n n G 


n 


then for each x G [a, b] there are sequences h± ^ 0 such that h n < 0 < hf and 


f{x + K) -f{x) 


f(x + h+) -f(x) 

h n 

/ ll 

ht 


> n. 


The numerator of these fractions is at most 2||/||, so > 0 as n — > oo. Thus / is 
not differentiable at x, nor does it even have a left or right derivative at x. Also, / 
is nonmonotone on every interval of length 1/n. Since every interval J contains an 
interval of length 1/n when n is large enough, / is nonmonotone on J. □ 


Further generic properties of continuous functions have been studied, and you 
might read about them in the books A Primer of Real Functions by Ralph Boas, 
Differentiation of Real Functions by Andrew Bruckner, or A Second Course in Real 
Functions by van Rooij and Schikhof. 


8* Spaces of Unbounded Functions 

When we contemplate equicontinuity, how important is it that the functions we deal 
with are bounded, or have domain [a, b } and target R? To some extent we can replace 
a, b } with a metric space X and R with a complete metric space Y. Let J denote 
the set of all functions / : X — > Y. Recall from Exercise 2.116 that the metric dy on 
Y gives rise to a bounded metric 

dY{y,y' 
l + dy (y,y')’ 



where y, y r G Y . Note that p < 1. Convergence and Cauchyness with respect to p and 
dy are equivalent. Thus completeness of Y with respect to dy implies completeness 
with respect to p. In the same way we give T the metric 


d(f,g ) = sup 


X 


d Y (fx, gx) 
eO + d Y (fx,gx) ’ 
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A function / E 9 r is bounded with respect to dy if and only if for any constant 
function c we have sup^ dy(f(x ), c) < oo; i.e., d(f, c) < 1. Unbounded functions have 
d{f,c ) = 1. 

35 Theorem In the space 3 equipped with the metric d, 

(a) Uniform convergence of (f n ) is equivalent to d- convergence. 

(b) Completeness ofY implies completeness of 3. 

(c) The set 3^ of bounded functions is closed in 9b 

(d) The set C°(X,Y) of continuous functions is closed in 3. 

Proof (a) / = uniflim f n means that dy (/ n (x), f(x)) =4 0, which means that 

n— ^ oo 

d{fm f ) 0 . 

(b) If (/ n ) is Cauchy in 3 and Y is complete then, just as in Section 1, f(x) — 
lim f n (x) exists for each x E X. Cauchyness with respect to the metric d implies 

n— >oo 

uniform convergence and thus d(f n , /) — > 0. 

(c) If f n € ?b and d(f n ,f) 0 then sup X dy(f n (x), f(x)) 0. Since f n is 

bounded, so is /. 

(d) The proof that C° is closed in 3 is the same as in Section 1. □ 

The Arzela-Ascoli theorem is trickier. A family £ C 9 r is uniformly equicon- 
tinuous if for each e > 0 there is a S > 0 such that / E £ and dx(x,t) < S imply 
dy(f(x),f(t)) < e. If the S depends on x but not on / E £ then £ is pointwise 
equicontinuous. 

36 Theorem Pointwise equicontinuity implies uniform equicontinuity if X is com- 
pact. 

Proof Suppose not. Then there exists e > 0 such that for each S — 1/n we have 
points x n ,t n E X and functions f n E £ with dx(x n ,t n ) < 1/n and dy (/ n (x n ), f n (t n )) > 
e. By compactness of X we may assume that x n -E x$. Then t n — > xo, which leads 
to a contradiction of pointwise equicontinuity at xq. □ 

37 Theorem If the sequence of functions f n :X^Y is uniformly equicontinuous, 
X is compact, and for each x E X, the sequence (f n (x)) lies in a compact subset of 
Y , then (f n ) has a uniformly convergent subsequence. 

Proof Being compact, X has a countable dense subset D. Then the proof of the 
Arzela Ascoli Theorem in Section 3 becomes a proof of Theorem 37. □ 
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The space X is cr-compact if it is a countable union of compact sets, X = (J X{. 
For example Z, Q, R and R m are cr-compact, while any uncountable set equipped with 
the discrete metric is not a- compact. 

38 Theorem If X is cr-compact and if (f n ) is a sequence of pointwise equicontinuous 
functions such that for each x G X , the sequence ( f n (x )) lies in a compact subset of 
Y , then (f n ) has a subsequence that converges uniformly to a limit on each compact 
subset of X. 


Proof Express X as JJ X{ with X{ compact. By Theorem 36 (f n \xi) is uniformly 
equicontinuous. By Theorem 37 there is a subsequence /y n that converges uniformly 
on Xi, and it has a sub-subsequence / 2 ?n that converges uniformly on X 2 , and so 
on. A diagonal subsequence (g m ) converges uniformly on each X{. Thus (g m ) con- 
verges pointwise. If A C X is compact, then (g m |n) is uniformly equicontinuous and 
pointwise convergent. By the proof of the Arzela Ascoli propagation theorem, (g m |n) 
converges uniformly. □ 


39 Corollary If (f n ) is a sequence of pointwise equicontinuous functions R — > R ; 
and for some xo G R, (/ n (x 0)) is bounded then (f n ) has a subsequence that converges 
uniformly on every compact subset of R. 


Proof Let [a, b] be any interval containing xq. By Theorem 36, the restrictions of f n 
to [a, b] are uniformly equicontinuous, and there is a 5 > 0 such that if £, s G [a, b] 
then \t — s\ < S implies that \f n (t) — f n ( s )\ < 1- Each point x G [a, b] can be reached in 
< N steps of length starting at xo, if X > ( b — a)/S . Thus \f n (x)\ < |/ n (xo)|+X, 
and (/ n (x)) is bounded for each x G R. A bounded subset of R has compact closure 
and Theorem 38 gives the corollary. □ 
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Exercises 


In these exercises, C° = C°([a,6],M) is the space of continuous real-valued func- 
tions defined on the closed interval [a, b\. It is equipped with the sup norm, ||/|| = 
sup{|/(x)| : x G [a, 6]}. 


4. 




6. 


Let M, N be metric spaces. 

(a) Formulate the concepts of pointwise convergence and uniform convergence 
for sequences of functions f n :M^N. 

(b) For which metric spaces are the concepts equivalent? 

Suppose that f n =4 / where / and f n are functions from the metric space 


M to the metric space N. (Assume nothing about the metric spaces such 
as compactness, completeness, etc.) If each f n is continuous prove that / is 
continuous. [Hint: Review the proof of Theorem 1. 

Let f n : [a, b] — > R be a sequence of piecewise continuous functions, each of 
which is continuous at the point xq G [a, b\. Assume that f n =4 / as n 


oo. 


(a) Prove that / is continuous at x$. [Hint: Review the proof of Theorem 1. 

(b) Prove or disprove that / is piecewise continuous. 

(a) If fn : R -G R is uniformly continuous for each n G N and if f n =4 / as 
n -G oo, prove or disprove that / is uniformly continuous. 


(b) What happens for functions from one metric space to another instead of 
R to R? 

Suppose that f n : [a, 6] — > R and / n 4 / as n G oo. Which of the following 
discontinuity properties (see Exercise 3.36) of the functions f n carries over to 
the limit function? (Prove or give a counterexample.) 

(a) No discontinuities. 

(b) At most ten discontinuities. 

(c) At least ten discontinuities. 

(d) Finitely many discontinuities. 

(e) Countably many discontinuities, all of jump type. 

(f) No jump discontinuities. 

(g) No oscillating discontinuities. 

(a) Prove that C° and R have equal cardinality. [Clearly there are at least 
as many functions as there are real numbers, for (7° includes the constant 
functions. The issue is to show that there are no more continuous functions 
than there are real numbers.] 

(b) Is the same true if we replace [a, b] with R or a separable metric space? 

(c) In the same vein, prove that the collection 7 of open subsets of R and R 
itself have equal cardinality. 

(d) What about more general metric spaces in place of R? 
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7. Consider a sequence of functions f n in ( 7 °. The graph G n of f n is a compact 
subset of R 2 . 

(a) Prove that (/ n ) converges uniformly as n -G oo if and only if the sequence 
(G n ) in !K(R 2 ) converges to the graph of a function / G ( 7 °. (The space 
X was discussed in Exercise 2.147.) 

(b) Formulate equicontinuity in terms of graphs. 

8. Is the sequence of functions f n : R — > R defined by 

fn ( x ) — cos (n + x) + log(l H — - 1 = = sin 2 (n n x)) 

\Jn + 2 

equicontinuous? Prove or disprove. 

9. If/ : R -G R is continuous and the sequence f n (x) — f(nx ) is equicontinuous, 
what can be said about /? 

10. Give an example to show that a sequence of functions may be uniformly contin- 
uous, pointwise equicontinuous, but not uniformly equicontinuous, when their 
domain M is noncompact. 

11. If every sequence of pointwise equicontinuous functions M -G R is uniformly 
equicontinuous, does this imply that M is compact? 

12. Prove that if £ C C®(M,N) is equicontinuous then so is its closure. 

13. Suppose that (f n ) is a sequence of functions R — >> R and for each compact subset 
K C R, the restricted sequence (f n \ k) is pointwise bounded and pointwise 
equicontinuous. 

(a) Does it follow that there is a subsequence of (f n ) that converges pointwise 
to a continuous limit function R — > R? 

(b) What about uniform convergence? 

14. Recall from Exercise 2.78 that a metric space M is chain connected if for each 
c > 0 and each p,gGM there is a chain p — po? - - - ,Pn = Q in M such that 


d(pk~i,Pk) < e f° r 1 < k < n. 


A family T of functions f : M R is bounded at p G M if the set {f(p) : / G T} 

is bounded in R. 

Show that M is chain connected if and only if pointwise boundedness of an 
equicontinuous family at one point of M implies pointwise boundedness at 
every point of M . 

15. A continuous, strictly increasing function p : (0, oo) — >► (0, oo) is a modulus of 
continuity if /i(s) -G 0 as s — > 0. A function / : [a, b] — >► R has modulus of 
continuity fi if |/(s) — f{t) \ < /i(|s — t\) for all s,t G [a, b ] . 

(a) Prove that a function is uniformly continuous if and only if it has a modulus 
of continuity. 

(b) Prove that a family of functions is equicontinuous if and only if its members 
have a common modulus of continuity. 
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16. Consider the modulus of continuity fi(s) — Ls where L is a positive constant. 

(a) What is the relation between C M and the set of Lipschitz functions with 
Lipschitz constant < LI 

(b) Replace [a, b } with R and answer the same question. 

(c) Replace [a, b } with N and answer the same question. 

(d) Formulate and prove a generalization of (a). 

17. Consider a modulus of continuity /i(s) — Hs a where 0 < a < 1 and 0 < H < oo. 
A function with this modulus of continuity is said to be a-Holder, with a- 
Holder constant H. See also Exercise 3.2. 

(a) Prove that the set C a (H ) of all continuous functions defined on [a, b } which 
are a-Holder and have <n-Holder constant < H is equicontinuous. 

(b) Replace [a, b] with (a, b). Is the same thing true? 

(c) Replace [a, b] with R. Is it true? 

(d) What about Q? 

(e) What about N? 

18. Suppose that (/ n ) is an equicontinuous sequence in C° and p G [a, b] is given. 

(a) If ( f n (p )) is a bounded sequence of real numbers, prove that (f n ) is uni- 
formly bounded. 

(b) Reformulate the Arzela-Ascoli Theorem with the weaker boundedness hy- 
pothesis in (a). 

(c) Can [a, b] be replaced with (a, 6)?, Q?, R?, N? 

(d) What is the correct generalization? 

19. If M is compact and A is dense in M, prove that for each 5 > 0 there is a finite 
subset {ai, . . . , a^} C A which is 5- dense in M in the sense that each x G M 
lies within distance 5 of at least one of the points ai, . . . , 

20. Given constants a, /3 > 0 define 


faA x ) = x a sin(x P) 


for x > 0. 

(a) For which pairs cp (3 is f a ^ uniformly continuous? 

(b) For which sets of (cq/3) in (0, oo ) 2 is the family equicontinuous? 

[Hint: Draw picture of the graphs when a > 2 or /3 > 2. How about a > 1 or 
13 >17} 

21. Suppose that £ C C° is equicontinuous and bounded. 

(a) Prove that sup{/(x) : / G £} is a continuous function of x. 

(b) Show that (a) fails without equicontinuity. 

(c) Show that this continuous-sup property does not imply equicontinuity. 

(d) Assume that the continuous-sup property is true for each subset fF C £. 
Is £ equicontinuous? Give a proof or counterexample. 
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22. Give an example of a sequence of smooth equicontinuous functions f n : [a, b] 

R whose derivatives are not uniformly bounded. 

23. Let M be a compact metric space, and let ( i n ) be a sequence of isometries 
i n : M — > M . 

(a) Prove that there exists a subsequence i nk that converges to an isometry i 
as k — > oo. 

(b) Infer that the space of self-isometries of M is compact. 

(c) Does the inverse isometry converge to z _1 ? (Proof or counterexample.) 

(d) Infer that the group of orthogonal 3x3 matrices is compact. [Hint: Is 
it true that each orthogonal 3x3 matrix defines an isometry of the unit 
2-sphere to itself?] 

(e) How about the group ofmxm orthogonal matrices? 

24. Suppose that a sequence of continuous functions f n : [a, b] -G R converges 
monotonically down to a limit function /. (That is, for all x G [a, b] we have 
fi(x) > f 2 (x) > fs(x) > ... and f n (x) -G f{pc) asnG oo.) 

(a) Prove that the convergence is uniform and conclude that / is continuous. 

(b) What if the sequence is increasing instead of decreasing? 

(c) What if you replace [a, b } with R? 

(d) What if you replace [a, b] with a compact metric space or R m ? 

25. Suppose that / : M — > M is a contraction, but M is not necessarily complete. 

(a) Prove that / is uniformly continuous. 

(b) Why does (a) imply that / extends uniquely to a continuous map / : M — > 
M, where M is the completion of Ml 

(c) Is / a contraction? 

26. Give an example of a contraction of an incomplete metric space that has no 
fixed- point. 

27. Suppose that / : M — > M and for all x, y G M, if x ^ y then d(fx , fy ) < d(x, y). 
Such an / is a weak contraction. 

(a) Is a weak contraction a contraction? (Proof or counterexample.) 

(b) If M is compact is a weak contraction a contraction? (Proof or counterex- 
ample.) 

(c) If M is compact, prove that a weak contraction has a unique fixed-point. 

28. Suppose that / : R — ^ R is differentiable and its derivative satisfies \f\x)\ < 1 
for all x G R. 

(a) Is / a contraction? 

(b) A weak one? 

(c) Does it have a fixed-point? 

29. Give an example to show that the fixed-point in Brouwer’s Theorem need not 
be unique. 
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30. 


31. 


32. 


Give an example of a continuous map of a compact, nonempty, path-connected 
metric space into itself that has no fixed-point. 

On page 233 it is shown that if b n — t 2 ) n dt — 1 then for some constant c, 
and for all n G N, b n < Cyjn. What is the best (i.e., smallest) value of c that 
you can prove works? (A calculator might be useful here.) 

Let M be a compact metric space, and let C Lip be the set of continuous functions 
/ : M R that obey a Lipschitz condition: For some L and all p, q G M we 
have 

I fp - fq\ < Ld(p,q). 


* 




(a) Prove that C Lip is dense in C°(M, R). [Hint: Stone- Weierstrass.] 

***(b) If M — [a, b\ and R is replaced by some other complete, path-connected 
metric space, is the result true or false? 

***(c) If M is a general compact metric space and Y is a complete metric space, 
is C Lip (M, Y) dense in C°(M,Y)? (Would M equal to the Cantor set 
make a good test case?) 

33. Consider the ODE x' — x on R. Show that its solution with initial condition 
xq is t i— > e t xo. Interpret e t+s = e t e s in terms of the flow property. 

34. Consider the ODE y' — 2^/\y\ where y G R. 

(a) Show that there are many solutions to this ODE, all with the same initial 
condition y(0) = 0. Not only does y(t) — 0 solve the ODE, but also 
y{t) — t 2 does for t > 0. 

(b) Find and graph other solutions such as y(t) = 0 for t < c and y(t) — ( t — c ) 2 
for t > c > 0. 

(c) Does the existence of these nonunique solutions to the ODE contradict 
Picard’s Theorem? Explain. 

*(d) Find all solutions with initial condition y(0) = 0. 

35. Consider the ODE x' — x 2 on R. Find the solution of the ODE with initial 
condition xq. Are the solutions to this ODE defined for all time or do they 
escape to infinity in finite time? 

36. Suppose that the ODE x' — f{pc) on R is bounded, \f(x)\ < M for all x. 

(a) Prove that no solution of the ODE escapes to infinity in finite time. 

(b) Prove the same thing if / satisfies a Lipschitz condition, or more generally, 
if there are constants C, K such that \f(x)\ < C\x\ + K for all x. 

(c) Repeat (a) and (b) with R m in place of R. 

(d) Prove that if / : R m — > R m is uniformly continuous then the condition 
stated in (b) is true. Infer that solutions of uniformly continuous ODEs 
defined on R m do not escape to infinity in finite time. 

37. (a) Prove Borel’s Lemma, which states that given any sequence whatsoever 

of real numbers (a r ), there is a smooth function / : R — > R such that 
j( r )(0) = a r . [Hint: Try / = ^ ^[x)a^x k jk\ where /3& is a well-chosen 
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bump function. 

(b) Infer that there are many Taylor series with radius of convergence R — 0. 

(c) Construct a smooth function whose Taylor series at every x has radius of 
convergence R — 0. [Hint: Try ^ f3k(x)e(x + q^) where {gi, g 25 • • •} = Q. 

*38. Suppose that T C (a, b ) clusters at some point of (a, b ) and that /, g : (a, b) — > R 
are analytic. Assume that for allt G T we have/(t) = g(t). 

(a) Prove that f — g everywhere in (a, b). 

(b) What if / and g are only C°°l 

(c) What if T is an infinite set but its only cluster points are a and 6? 

**(d) Find a necessary and sufficient condition for a subset Z C (a, b) to be 

the zero locus of an analytic function / defined on (a, 6), Z — {x E 
(a,b) : f{x) — 0}. [Hint: Think Taylor. The result in (a) is known as 
the Identity Theorem. It states that if an equality between analytic 
functions is known to hold for points of T then it is an “identity,” an 
equality that holds everywhere.] 

39. Let M be any metric space with metric d. Fix a point p G M and for each 
g G M define the function f q (x) — d(q, x ) — d(p, x). 

(a) Prove that f q is a bounded, continuous function of x G M, and that the 
map q ^ f q sends M isometrically onto a subset Mo of C^(M, R). 

(b) Since C^(M, R) is complete, infer that an isometric copy of M is dense 
in a complete metric space, namely the closure of Mq, and hence that we 
have a second proof of the Completion Theorem 2.80. 

40. As explained in Section 8, a metric space M is cr-compact if it is the countable 
union of compact subsets, M — \J Mi. 

(a) Why is it equivalent to require that M is the monotone union of compact 
subsets, 

M = d) Mi 

i.e., Mi C M 2 C . . .? 

(b) Prove that a a - compact metric space is separable. 

(c) Prove that Z, Q, R, R m are cr-compact 

*(d) Prove that C° is not cr-compact. [Hint: Think Baire.] 

*(e) If M = (Jj int(Mi) and each Mi is compact, M is cr*-compact. Prove that 
M is <r*-compact if and only if it is separable and locally compact. Infer 
that Z, R, and R m are cr*- compact but Q is not. 

(f) Assume that M is cr*-compact, M = (Jj int(M^), with each Mi compact. 
Prove that this monotone union “engulfs” all compacts in M, in the sense 
that if A C M is compact, then for some z, A C Mi. 

( g ) if m = Q Mi and each Mi is compact show by example that this engulfing 
property may fail, even when M itself is compact. 

**(h) Prove or disprove that a complete cr-compact metric space is <r*-compact. 
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41. (a) Give an example of a function / : [0, 1] x [0,1] — > R such that for each 

fixed x, the function y i— > /(x,y) is a continuous function of y, and for 
each fixed y, the function x /(x, y) is a continuous function of x, but / 
is not continuous. 

(b) Suppose in addition that the set of functions 


£ = {x H- f(x, y) : y <E [0, 1]} 


is equicontinuous. Prove that / is continuous. 

42. Prove that R cannot be expressed as the countable union of Cantor sets. 

43. What is the joke in the following picture? 
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More Prelim Problems 


1. Let / and / n , n E N, be functions from R to R. Assume that f n (x n ) — » f(x) as 
n -E oo and x n — > x. Show that / is continuous. (Note: The functions f n are 
not assumed to be continuous.) 

2. Suppose that f n E C° and for each x E [a, 6], 


/i(z) > h(x) > ... 


and lim f n (x) — 0. Is the sequence equicontinuous? Give a proof or coun- 

n—± oo 

terexample. [Hint: Does / n (x) converge uniformly to 0, or does it not?] 

3. Let E be the set of all functions u : [0, 1] -E R such that u( 0) = 0 and u satisfies 
a Lipschitz condition with Lipschitz constant 1. Define : E — > R according to 
the formula 

0(u) = / (u(x) 2 — u(x)) dx. 

Jo 

Prove that there exists a function u E E at which (j){u) attains an absolute 
maximum. 

4. Let (g n ) be a sequence of twice-differentiable functions dehned on [0,1], and 
assume that for all n,g n ( 0) = g' n ( 0). Suppose also that for all n E N and all 
x E [0, 1 },Wn (x)\ < 1. Prove that there is a subsequence of (g n ) converging 
uniformly on [0, 1]. 

5. Let (a n ) be a sequence of nonzero real numbers. Prove that the sequence of 
functions 

1 

f n (x ) = — sin(a n x) + cos(x + a n ) 


6 . 

7. 

8 . 

9. 

10. 


has a subsequence converging to a continuous function. 

Suppose that / : R — > R is differentiable, /( 0) = 0, and f'(x) > f(x) for all 
x E R. Prove that f(x) > 0 for all x > 0. 

Suppose that / : [a, 6] — > R and the limits of f(x) from the left and the right 
exist at all points of [a, b\. Prove that / is Riemann integrable. 

Let h : [0, 1) — > R be a uniformly continuous function where [0, 1) is the half- 
open interval. Prove that there is a unique continuous map g : [0, 1] — > R such 
that g(x) — h(x) for all x E [0,1). 

Assume that / : R — > R is uniformly continuous. Prove that there are constants 
A, B such that |/(x)| < A + £>|x| for all x E R. 

Suppose that /(x) is dehned on [—1, 1] and that its third derivative exists and 
is continuous. (That is, / is of class C 3 .) Prove that the series 


E ( n (/(V«) - /(-!/»»)) - V'O)) 

71=0 


converges. 
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11. Let A C R m be compact, x G A. Let (x n ) be a sequence in A such that every 
convergent subsequence of (x n ) converges to x. 

(a) Prove that the sequence (x n ) converges. 

(b) Give an example to show if A is not compact, the result in (a) is not 
necessarily true. 

12. Let / : [0, 1] —y R be continuously differentiable, with /( 0) = 0. Prove that 

||/|| 2 < / ( f'(x)) 2 dx 
Jo 

where ||/|| = sup{|/(t)| : 0 < t < 1}. 

13. Let f n : R — >> R be differentiable functions, n = 1,2, . . ., with f n ( 0) = 0 and 
I f'n (x)| < 2 for all n, x. Suppose that 


lim f n (x) = g(x) 

n— >oo 


for all x. Prove that g is continuous. 

14. Let X be a nonempty connected set of real numbers. If every element of X is 
rational, prove that X has only one element. 

15. Let k 0 be an integer and define a sequence of maps fn • IP — y IP as 



x 


k 


x 2 + n 


n — 1,2,.... For which values of k does the sequence converge uniformly on R? 
On every bounded subset of R? 

16. Let / : [0, 1] —y R be Riemann integrable over [5, 1] for every b such that 
0 < b < 1. 

(a) If / is bounded, prove that / is Riemann integrable over [0, 1]. 

(b) What if / is not bounded? 

17. (a) Let S and T be connected subsets of the plane M 2 having a point in 

common. Prove that S U T is connected. 

(b) Let {S a } be a family of connected subsets of R 2 all containing the origin. 
Prove that JJ S a is connected. 

18. Let / : R —y R be continuous. Suppose that R contains a countably infinite set 
S such that 

q 

f(x) dx — 0 

if p and q are not in S. Prove that / is identically zero. 

19. Let / : R — > R satisfy f(x) < f(y ) for x < y. Prove that the set where / is not 
continuous is finite or countably infinite. 
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20. Let (g n ) be a sequence of Riemann integrable functions from [0,1] into R such 
that \g n (x)\ < 1 for all n,x. Define 


21 . 

22 . 


23. 


24. 


25. 


26. 


27. 


28. 


rx 

G n (x) = / g n {t) dt. 
Jo 


Prove that a subsequence of (G n ) converges uniformly. 

Prove that every compact metric space has a countable dense subset. 

Show that for any continuous function / : [0, 1] — > R and for any e > 0 there is 
a function of the form 

n 

g ( x ) = 22 c k x k 

k = 0 

for some n G N, and | g{x) — f(x)\ < e for all x in [0, 1]. 

Give an example of a function / : R — > R having all three of the following 
properties: 

(a) f{x) — 0 for all x < 0 and x > 2. 

(o a i) = i. 

(c) / has derivatives of all orders. 

(a) Give an example of a differentiable function / : R — >> R whose derivative 
is not continuous. 

(b) Let / be as in (a). If f'( 0) < 2 < /'( 1) prove that / 7 (x) = 2 for some 
x G [0, 1]. 

Let U C R m be an open set. Suppose that the map h : U — > R ?n is a homeo- 
morphism from U onto R m which is uniformly continuous. Prove that U — R 
Let (/ n ) be a sequence of continuous maps [0, 1] — > R such that 


m 


1 


(fn(y)) 2 dy < 5 


0 


for all n. Define g n : [0, 1] — > R by 


9 n(x)= / y/x + y fn(y)dy 

Jo 


(a) Find a constant K > 0 such that \g n (x)\ < K for all n. 

(b) Prove that a subsequence of the sequence (g n ) converges uniformly. 
Consider the following properties of a map / : R m — > R. 

(a) / is continuous. 

(b) The graph of / is connected in R m x R. 

Prove or disprove the implications (a) (b), (b) (a). 

Let (P n ) be a sequence of real polynomials of degree < 10. Suppose that 


lim P n (x) — 0 

n— >oo 
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29. 

30. 


31. 

32. 

33. 


34. 


35. 

36. 

37. 



for all x G [0, 1]. Prove that P n (x) =4 0, 0 < x < 1. What can you say about 
P n {x ) for 4 < x < 5? 

Give an example of a subset of R having uncountably many connected compo- 
nents. Can such a subset be open? Closed? Does your answer change if R 2 
replaces R? 

For each (a, 6, c) G R 3 consider the series 


E 



(logn) c 


Determine the values of a, 6, and c for which the series converges absolutely, 
converges conditionally, diverges. 

Let X be a compact metric space and / : I G X an isometry. (That is, 
d(/(x), f(y)) — d(x,y) for all x,y G X .) Prove that f(X) — X. 

Prove or disprove: Q is the countable intersection of open subsets of R. 

Let / : R -G R be continuous and 



\f(x)\ dx < oo. 


Show that there is a sequence ( x n ) in R such that x n -G oo, x n f(x n ) -G 0, and 
X n f( —x n ) g 0 as n G oo. 

Let / : [0, 1] GRbea continuous function. Evaluate the following limits (with 
proof): 


(a) lim / x n f(x)dx (b) lim n / x n f(x)dx. 

7WOO Jq 7WOO Jq 

Let K be an uncountable subset of R m . Prove that there is a sequence of 
distinct points in K which converges to some point of K. 

Prove or give a counterexample: Every connected locally pathwise-connected 
set in R m is pathwise-connected. 

Let ( f n ) be a sequence of continuous functions [0, 1] -G R such that f n (x) -G 0 
for each x G [0, 1]. Suppose that 


f f n {x)dx 
0 


< K 


for all n where I\ is a constant. Does f} f n (x) dx converge to 0 as n —r oo? 
Prove or give a counterexample. 

Let E be a closed, bounded, and nonempty subset of R m and let / : E -g E be 
a function satisfying \f(x) — f(y) \ < \x — y\ for all x,y E E, x ^ y. Prove that 
there is one and only one point xq G E such that f(x o) = xq. 
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39. Let / : [0, 2 tt] — > R be a continuous function such that 

27 r 

/(x) sin(nx) dx = 0 

for all integers n > 1. Prove that / is identically constant. 

40. Let be continuous real- valued functions on [0,1] such that for each 

x G [0, 1], /i(x) > f2 (x) > . . .. Assume that for each x, fn (x) converges to 0 as 
n — > oo. Does f n converge uniformly to 0? Give a proof or counterexample. 

41. Let / : [0, oo) — > [0, oo) be a monotonically decreasing function with 

(X) 

/(x) dx < oo. 

Prove that lim x/(x) = 0. 

x^-oo 

42. Suppose that F : R m R m is continuous and satisfies 




F(x) — F(y ) | > A|x — y 


for all x, y G R m and some constant A > 0. Prove that F is one-to-one, is onto, 
and has a continuous inverse. 

43. Show that [0, 1] cannot be written as a countably infinite union of disjoint closed 
subintervals. 

44. Prove that a continuous function / : R — > R which sends open sets to open sets 
must be monotonic. 

45. Let / : [0, oo) — > R be uniformly continuous and assume that 

f b 

lim / /(x) dx 

b—toc Jo 


46. 


exists (as a finite limit). Prove that 


lim /(x) — 0. 

x— ^oo 


Prove or supply a counterexample: If / and g are continuously differentiable 
functions defined on the interval 0 < x < 1 which satisfy the conditions 


lim /(x) = 0 = lim g(x) 

^0 x—?() 


and 


lim 

x—?() 


/Qg) 

g(x) 


47. 


and if g and g ' never vanish, then lim 


f'(x) 


g'{x) 


c. (This is a converse of 


L’Hopital’s rule.) 

Prove or provide a counterexample: If the function / from R to R has both a 
left and a right limit at each point of R, then the set of discontinuities is at 
most countable. 
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53. 


54. 


55. 


56, 


48. Prove or supply a counterexample: If / is a nondecreasing real- valued function 
on [0, 1] then there is a sequence f n , n — 1,2, , of continuous functions on 
[0, 1] such that for each x in [0, 1], lim /, , (.?;) = f(x). 

71—^00 

49. Show that if / is a homeomorphism of [0, 1] onto itself then there is a sequence 
of polynomials P n (x), n — 1 , 2 ,..., such that P n -G / uniformly on [0, 1] and 
each P n is a homeomorphism of [0, 1] onto itself. [Hint: First assume that / is 
C l ] 

50. Let / be a C 2 function on the real line. Assume that / is bounded with bounded 
second derivative. Let A — sup x |/(x)| and B = sup x \ f"(x)\. Prove that 

sup \f(x)\ < 2y/ AB. 

X 

51. Let / be continuous on R and let 


1 n_1 / 

fnix) = \ x + 

n \ 

k = o v 


k 

n 


Prove that f n (x ) converges uniformly to a limit on every finite interval [a, b}. 

52. Let / be a real- valued continuous function on the compact interval [a, b ]. Given 
e > 0, show that there is a polynomial p such that 

p(a ) = /(a), p\a ) = 0, and \p(x) — f(x)\ < e 


for all x G [a, b ]. 

A function / : [0, 1] — > R is said to be upper semicontinuous if, given x G 
[0, 1] and e > 0, there exists a 5 > 0 such that \y — x\ <5 implies that 
f(y) < f{x) + e. Prove that an upper semicontinuous function on [0, 1] is 
bounded above and attains its maximum value at some point p G [0, 1]. 

Let /(x), 0 < x < 1, be a continuous real function with continuous derivative 
f\x). Let M be the supremum of \ f'(x)\, 0 < x < 1. Prove the following: For 
n — 1 , 2 ,..., 

71—1 / 7 \ /»! 


n 


T.f 

k = 0 


k 


n 


/(x) dx 


o 


< 


M 

2 n 


Let K be a compact subset of M m and let (Bj) be a sequence of open balls 
which cover K. Prove that there is an e > 0 such that each e-ball centered at 
a point of K is contained in at least one of the balls Bj. 

Let / be a continuous real-valued function on [0, oo) such that 


lim (f{x)+ f f(t) dt 

x ^°° \ Jo 


exists (and is finite). Prove that linr^oo /(x) = 0. 
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57. 

58. 



60. 



62. 

63. 

64. 

65. 

66 . 


A standard theorem asserts that a continuous real-valued function on a com- 
pact set is bounded. Prove the converse: If A is a subset of R m and if every 
continuous real- valued function defined on K is bounded, then K is compact. 
Let J be a uniformly bounded equicontinuous family of real-valued functions 
defined on the metric space X. Prove that the function 

g(x) = sup{/(x) : / G J} 


is continuous. 

Suppose that (/ n ) is a sequence of nondecreasing functions which map the unit 
interval into itself. Suppose that lim f n (x) — f(x) pointwise and that / is a 

n— t>oo 

continuous function. Prove that f n [x) — > f[x) uniformly as n — > oo. Note that 
the functions f n are not necessarily continuous. 

Does there exist a continuous real- valued function /(x), 0 < x < 1, such that 

/ x/(x) dx — 1 and / x n /(x ) dx — 0 

J 0 J 0 

for all n — 0, 2, 3, 4, 5, . . .? Give a proof or counterexample. 

Let / be a continuous, strictly increasing function from [0, oo) onto [0, oo) and 
let g = / -1 (the inverse, not the reciprocal). Prove that 


ra rb 

/ f(x) dx+ g{y) dy > ab 

J o J o 

for all positive numbers a, 6, and determine the condition for equality. 

Let / be a function [0, 1] — > R whose graph {(x, f(x)) : x G [0, 1]} is a closed 
subset of the unit square. Prove that / is continuous. 

Let (a n ) be a sequence of positive numbers such that ^ a n converges. Prove 
that there exists a sequence of numbers c n — >> oo as n — > oo such that c n a n 
converges. 

Let /(x,y) be a continuous real-valued function defined on the unit square 
[0,1] x [0, 1]. Prove that g[x) — ma x{/(x,y) : y G [0, 1]} is continuous. 

Let the function / from [0, 1] to [0, 1] have the following properties. It is of 
class C 1 , /( 0) = 0 = /( 1), and f is nonincreasing (i.e. , / is concave). Prove 
that the arclength of the graph of / does not exceed 3. 

Let A be the set of all positive integers that do not contain the digit 9 in their 
decimal expansions. Prove that 


E 


1 

a 


< oo. 


That is, A defines a convergent subseries of the harmonic series. 
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Multivariable Calculus 


This chapter presents the natural geometric theory of calculus in n dimensions. 


1 Linear Algebra 

It will be taken for granted that you are familiar with the basic concepts of linear 
algebra - vector spaces, linear transformations, matrices, determinants, and dimen- 
sion. In particular, you should be aware of the fact that an m x n matrix A with 
entries a^- is more than just a static array of mn numbers. It is dynamic. It can act. 
It defines a linear transformation Ta : R n R m that sends n-space to m-space 
according to the formula 

m n 

T a (v) = Y Y ai i v i ei 

1 = 1 j= 1 

where v = ^ Vj e j G R n and ei, . . . , e n is the standard basis of R n . (Equally, ei, . . . , e m 
is the standard basis of R m .) 

The set M = M(m, n) of all m x n matrices with real entries a^- is a vector space. 
Its vectors are matrices. You add two matrices by adding the corresponding entries, 
A + B = C where aij + b{j — Cij. Similarly, if A G R is a scalar then A A is the matrix 
with entries A a^-. The dimension of the vector space JVC is mn, as can be seen by 
expressing each A as aijEij where E{j is the matrix whose entries are 0, except for 
the entry which is 1. Thus, as vector spaces, JVC = R mn . 
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The set £ = £(R n ,R m ) of linear transformations T : R n — > R m is also a vector 
space. You combine linear transformations as functions, U = T + S being defined by 
U(v) — T{v) + S(v), and XT being dehned by (A T)(v) — A T{v). The vectors in £ are 
linear transformations. The mapping A Ta is an isomorphism T : JVC — > £. The 
matrix A is said to represent the linear transformation Ta : R n R m . As a rule of 
thumb, think with linear transformations and compute with matrices. 

Corresponding to composition of linear transformations is the product of matrices. 
If A is an m x k matrix and B is a k x n matrix then the product matrix P — AB is 
the m x n matrix whose (ij) th entry is 

k 

Pij — O'ilblj T ' ' * T C^ik^kj — ^ ^ CLi r b r j. 

r— 1 


1 Theorem Ta°Tb — Tab- 

Proof For each pair of basis vectors e r G R fc and ej G R n we have 


m 


k 


Ta^t) — ^ ^ dj r ej Th(ej) — ^ ^ brj ^ 


i — 1 


r= 1 


Thus for each basis vector ej we have 


k 


k 


k 


m 


( T a o T B )(ej ) = ^ W b rj e r = ^ b rj T A {e r ) = ^ b rj ^2 


di r Ci 


k r=l 
k m 


r= 1 
m k 


r = 1 2=1 



b y j CL 'fry* 6- 2 , — 



CL'l'pbyj 


r=l z=l 
m 


2=1 r=l 


^ ^ Pij C — Tab ( ej). 


2=1 


Two linear transformations that are equal on a basis are equal. 


□ 


Theorem 1 expresses the pleasing fact that matrix multiplication corresponds nat- 
urally to composition of linear transformations. See also Exercise 6. 


As explained in Chapter 1, a norm on a vector space V is a function 
that satisfies three properties: 


: C^R 


(a) For all v G V we have \v\ > 0; and 

(b) \Xv\ = | A | \v 


v 


0 if and only if v — 0. 
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(c) \v + w\ < |r>| + 


w 


(Note the abuse of notation in (b); |A| is the magnitude of the scalar A and |i;| is the 
norm of the vector v.) Norms are used to make vector estimates, and vector estimates 
underlie multivariable calculus. 


A vector space with a norm is a normed space. Its norm gives rise to a metric 


as 


d(v, v') — 


v — v 


Thus a normed space is a special kind of metric space. 

If V, W are normed spaces then the operator norm of a linear transformation 

T : V W is 

\Tv\ w 


T 


— sup 


\v\v 


: v 7^ 0 


The operator norm of T is the maximum stretch that T imparts to vectors in V . 
The subscript on the norm indicates the space in question, which for simplicity is 
often suppressed. ^ 

The composition of linear transformations obeys the norm inequality 

\ToS\\ < \\T\\ ||S|| 

where S : U -T V and T : V — > W. Thinking in terms of stretch, the inequality is 
clear: S stretches a vector u G U by at most ||5||, and T stretches S(u) by at most 
||T||. The net effect on u is a stretch of at most ||T|| ||*S||. 


2 Theorem Let T : V W be a linear transformation from one normed space to 
another. The following are equivalent: 

(a) ||T|| < oo. 

(b) T is uniformly continuous. 

(c) T is continuous. 

(d) T is continuous at the origin. 


Proof Assume (a), ||T|| < oo. For all v,v' G V, linearity of T implies that 


Tv — Tv' | < ||T|| \v — v' 


which gives (b), uniform continuity. Clearly (b) implies (c) implies (d). 

tlf ll T ll is finite then T is said to be a bounded linear transformation. Unfortunately, this 
terminology conflicts with T being bounded as a mapping from the metric space V to the metric space 
W. The only linear transformation that is bounded in the latter sense is the zero transformation. 
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Assume (d) and take e = 1. There is a 5 > 0 such that if u G V and \u\ < 5 then 


\Tu\ < 1. 

For any nonzero v G V, set u — Xv where A = d/2\v\. Then \u\ = S/2 < S and 


| Tv 


v 


\Tu\ 1 2 

| | < = T 

\u\ U 0 


which implies ||T|| < 2/5 and verihes (a). 


□ 


3 Theorem Every linear transformation T : R n — > W is continuous and every iso- 
morphism T : R n — > W is a homeomorphism. 


Proof The norm on R n is the Euclidean norm. If v — (v \, . . . , v n ) G R n then 


v 


— \/ • • • T v^. 


Let | denote the norm on W and let M — max{|T(ei)|uG • • • , \T(e n )\w}- For 
v — Y/ v j e j G R n we have \vj\ < |u| and 


n 


n 


\Tv\ w < J2\ T ( v i e j) 

3 = 1 


w — 'y ^ \ v j \ \T( e j)\w A n\v\M 

3 = 1 


which implies that T < nM < oo. Theorem 2 implies that T is continuous. 


Assume that T : R n — > W is an isomorphism. We have just shown that T is 
continuous, but what about T -1 ? Continuity of T implies that the T-image of the 
unit sphere is compact. Injectivity implies that O ^ T(£> n_1 ). Since O and T(£> n_1 ) 
are disjoint compact sets in the metric space W, there is a constant c > 0 such that 
for all u G iS n_1 we have dw(Tu,0) — \Tu\ > c. For each nonzero v G R n we write 
v — Xu where A = |u| and u — v / |u| is a unit vector. Linearity of T implies Tv — XTu 
which gives \Tv\ > c 


v 


i.e. 


v 


< 


Tv 


c 


For each w G W let v — T 1 (w). Then w — Tv and 


T~\w) 


V 


< 


\Tv\ 


c 


c 


w 


gives T 1 < 1/c < oo, and by Theorem 2 we get continuity of T i . A bicontinuous 

bijection is a homeomorphism. □ 
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Figure 106 The minimum distance from T(S n 1 ) to the origin is > c. 


Geometrically speaking, the inequality \Tv\ > c\v\ means that T shrinks each 


vector in W 1 by a factor no smaller than c, so it follows that T~ l expands each vector 
in W by a factor no greater than 1 jc. The largest c with the property \Tv\ > c\v 
for all v is the conorm of T. See Figure 106 and Exercise 4. 


4 Corollary In the world of finite- dimensional normed spaces , all linear transforma- 
tions are continuous and all isomorphisms are homeomorphisms. In particular, if a 
finite- dimensional vector space is equipped with two different norms then the identity 
map is a homeomorphism between the two normed spaces . In particular T : JVC — > £ 
is a homeomorphism. 


Proof Let V be an n-dimensional normed space and let T : V W be a linear 
transformation. As you know from linear algebra, there is an isomorphism H : 
W 1 — > V. Theorem 3 implies that H is a homeomorphism. Therefore H 1 is a 
homeomorphism. Since ToH is a linear transformation from R n to W it is continuous. 
Thus 


T = (ToH)oH- 1 


is the composition of continuous maps so it is continuous. 

Suppose that T : V — > W is an isomorphism and V is finite-dimensional. Then 
W is finite-dimensional and T -1 : W — > V is a linear transformation. Since every 
linear transformation from a finite-dimensional normed space to a normed space is 
continuous, T and T~ l are both continuous, so T is a homeomorphism. 
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Let a finite-dimensional vector space V be equipped with norms | \ 1 and | | 2 . 
Since the identity map is an isomorphism V\ — > V 2 it is a homeomorphism. The same 
applies to the isomorphism T that assigns to a matrix A the corresponding linear 
transformation Ta . □ 


2 Derivatives 

A function of a real variable y — f{pc) has a derivative f'(x ) at x when 



lim 

h—> 0 


f(x + h) - f(x) 

h 



If, however, x is a vector variable, (1) makes no sense. For what does it mean to 
divide by the vector increment hi Equivalent to (1) is the condition 

f{pc + h) = f(x) + f(x)h + R{h) lim = 0, 

h-> 0 \h\ 

which is easy to recast in vector terms. 


Definition Let /:£/—> R m be given where U is an open subset of R n . The function 
/ is differentiable at p G U with derivative ( Df) p — TUT: MT — > R m is a linear 
transformation and 

(2) f(p + v) — f(p) + T(v) + R(v) lim = 0. 

\v\^o \v\ 


We say that the Taylor remainder R is sublinear because it tends to 0 faster than 

v . 

When n — m — 1, the multidimensional definition reduces to the standard one. 
This is because a linear transformation R — > R is just multiplication by some real 
number, in this case multiplication by 

Here is how to visualize Df. Take m — n — 2. The mapping / : U — ^ R 2 distorts 
shapes nonlinearly; its derivative describes the linear part of the distortion. Circles 
are sent by / to wobbly ovals, but they become ellipses under ( Df ) p . Lines are sent 
by / to curves, but they become straight lines under ( Df ) p . See Figure 107 and also 
Appendix A. 

This way of looking at differentiability is conceptually simple. Near p, / is the 
sum of three terms: A constant term q — fp , a linear term (. Df) p v , and a sublinear 
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Figure 107 (. Df) p is the linear part of / at p. 


remainder term R(v). Keep in mind what kind of an object the derivative is. It is 
not a number. It is not a vector. No, if it exists then ( Df) p is a linear transformation 
from the domain space to the target space. 

5 Theorem If f is differentiable at p then it unambiguously determines ( Df) p ac- 
cording to the limit formula, valid for all u G W 2 , 

f(p + tu) - f(p) 


( 3 ) 


(Df) p (u) = lim 

t—> 0 


t 


Proof Let T be a linear transformation that satisfies (2). Fix any u G M n and take 
v — tu. Then 

f(p + tu) — f(p) T(tu ) + R{tu) 


t 


t 


= T(u) + 



t 

u 



u 


The last term converges to zero as t — > 0, which verifies (3). Limits, when they exist, 
are unambiguous and therefore if T' is a second linear transformation that satisfies 
(2) then T(u) — T\u ) so T — T' . □ 

6 Theorem Differentiability implies continuity. 

Proof Differentiability at p implies that 

\f(p + v) - f(p)\ = \(Df) p v + R(v)\ < \\(Df)p\\ H + \R(v)\ y 0 


as p + v — > p. 


□ 
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Df is the total derivative or Frechet derivative. In contrast, the ij th partial 
derivative of / at p is the limit, if it exists, 

dfi ( P ) = Um fiip + tej) - fj(p) 
dxj t-> o t 

7 Corollary If the total derivative exists then the partial derivatives exist and they 
are the entries of the matrix that represents the total derivative. 

Proof Substitute in (3) the vector u — ej and take the i th component of both sides 
of the resulting equation. □ 


As is shown in Exercise 15, the mere existence of partial derivatives does not imply 
differentiability. The simplest sufficient condition beyond the existence of the partials 
- and the simplest way to recognize differentiability - is given in the next theorem. 

8 Theorem If the partial derivatives of f : U — > R m exist and are continuous then 
f is differentiable. 


Proof Let A be the matrix of partials at p, A — [ dfi(p)/dxj ], and let T : R n — > R m 
be the linear transformation that A represents. We claim that ( Df) p — T. We must 
show that the Taylor remainder 


R{v) = f(p + v)~ f(p) - Av 


is sublinear. Draw a path a — [a i, . . . , cr n ] from p to q — p + v that consists of n 
segments parallel to the components of v. Thus v — ^ Vj ej and 


c jj(t ) = Pj-i + tvj ej 0 < t < 1 

is a segment from Pj-i — p + ^2k<j v k e k to pj — Pj-i + v j e j- See Figure 108. 

By the one-dimensional chain rule and mean value theorem applied to the differ- 
entiable real- valued function g(t) — fi o crjft) of one variable, there exists t{j G (0, 1) 
such that 

fiiPj) - fiiPj-i ) = 5(1) - 5(0) = g'iUj) = 
where pij — &j{tij). Telescoping fi(p + v) — fi(p) along a gives 


Ri(v) = fiip + v)- fi(p ) - (Av)i 

= E - fiipj-i) - 

.7 = 1 V 3 

_ ( dfj ( Pij ) _ dfj(p) \ 

^ \ dxj dxj f 3 
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p 3 =p+\> = q 



Figure 108 The segmented path a from p to q 


Continuity of the partials implies that the terms inside the curly brackets tend to 0 
as \v\ — > 0. Thus R is sublinear and / is differentiable at p. □ 

Next we state and prove the basic rules of multivariable differentiation. 

9 Theorem Let f and g be differentiable. Then 

(a) D(f + cg) = Df + cDg. 

(b) D(constant) — 0 and D{T(x )) = T. 

( c ) D(g o /) = DgoDf. (Chain Rule) 

(d) D(f • g) = Df • g + f • Dg. (Leibniz Rule ) 


There is a fifth rule that concerns the derivative of the nonlinear inversion operator 
Inv : T i— > T~ l . It is a glorified version of the formula 


dx 1 
dx 



5 


and is discussed in Exercises 32 - 36. 


Proof (a) Write the Taylor estimates for / and g and combine them to get the Taylor 
estimate for f T eg. 


fip + v) 
g(p + v ) 

(/ + eg) (P + v ) 


fip ) + ( D f)p{v ) + Rf 
9 ip) + ( Dg) p (v ) + R g 

if + eg) {p) + ((Df) p + c(Dg) p )iv) + Rf + cR g . 


Since Rf + cR g is sublinear, 


( Df) p + c(Dg) p is the derivative of f T eg at p. 


286 


Multivariable Calculus 


Chapter 5 


(b) If / : R n — > R m is constant, f(x) — c for all x G R n , and if 0 : R n — > R m 
denotes the zero transformation then the Taylor remainder R(v) = /(p + v) — f(p ) — 
O(u) is identically zero. Hence D (constant) p — O. 

T : R n — > R m is a linear transformation. If f(x) — T(x) for all x then substituting 
T itself in the Taylor expression gives the Taylor remainder R(v) — f(p + v) — f(p ) — 
T(v), which is identically zero. Hence (. DT) p — T. 

Note that when n — m — 1, a linear function is of the form f{pc) — ax , and the 
previous formula just states that (ax)' — a. 

(c) Tacitly, we assume that the composite g o f(x) — g(f(x)) makes sense as x 
varies in a neighborhood of p G U. The notation Dg o Df refers to the composite of 
linear transformations and is written out as 


D(g o f) p = (. Dg) q o (Df) 


P 


where q — The Chain Rule states that the derivative of a composite is the 

composite of the derivatives. Such a beautiful and natural formula must be true. See 
also Appendix A. Here is a proof. 

It is convenient to write the remainder R{v) — f(p + v) — f(p) — T(v) in a different 
form, defining the scalar function t{y) by 


'\m 

c(v) = ^ \v\ 

0 


if v 7 ^ 0 
if v = 0 . 


Sublinearity is equivalent to lim c ( v ) = 0. Think of c as an “error factor. 
The Taylor expressions for / at p and g at q — f(p) are 




f(p + v) = f(p ) + Av + Rf 
g(q + w) = g(q) + Bw + R g 

where A — ( Df) p and B — ( Dg) q as matrices. The composite is expressed as 

g o f(p + v) = g(q + Av + Rf(v )) = g(q) + BAv + BRf(v ) + R g (w) 

where w = Av + Rf(v). It remains to show that the remainder terms are sublinear 
with respect to v. First 


BR f (v) 


< 


B || | Rf(v) 
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is sublinear. Second, 


w 


— \Av + Rf(v)\ < ||A|| |x| + tf{y) 


v 


Therefore. 


Rg(w)\ < tg(w)\w\ < t g (w)(\\A\\ + tf(v)) 


V 


Since t g {w) — > 0 as w — >> 0 and since v — > 0 implies that w does tend to 0 , we see that 
R g (w ) is sublinear with respect to v. It follows that (D(g o /)) p = BA as claimed. 

(d) To prove the Leibniz Product Rule, we must explain the notation v • w. In 
R there is only one product, the usual multiplication of real numbers. In higher- 
dimensional vector spaces, however, there are many products and the general way to 
discuss products is in terms of bilinear maps. 


A map (3 : V x W — > Z is bilinear if V, W, Z are vector spaces and for each 
fixed v G V the map f3(v, . ) : W — > Z is linear, while for each fixed w G W the map 
/?( . , ic) : V — > Z is linear. Examples are 

(i) Ordinary real multiplication (x, y) i— > xy is a bilinear map R x R — > R. 

(ii) The dot product is a bilinear map R n x R n — > R. 

(iii) The matrix product is a bilinear map M (m x k) x M (k x n) — > M(m x n). 

The precise statement of (d) is that if f3 : R fc x R^ — > R m is bilinear while / : 
U — > R fc and g : U — > R^ are differentiable at p then the map x i— > /3(/(x), y(x)) is 
differentiable at p and 


( DP(f,g))p(v ) = / 3((Df) p (v),g(p )) + P(f(p), (Dg) p (v)). 


Just as a linear transformation between finite-dimensional vector spaces has a finite 
operator norm, the same is true for bilinear maps: 


: x, re ^ 0 } < oo. 


p 

r \P(V 

— sup j — — 

, w 


\v 

w 


To check this we view (3 as a linear map Ty : R fc — > £(R^,R m ). According to 
Theorems 2 and 3, a linear transformation from one finite dimensional normed space 
to another is continuous and has finite operator norm. Thus the operator norm Ty 
is finite. That is, 

nw 


T, 


P 


— max 


v 


: v 0 > < oo. 


But ||T^(x)|| = max{| f3(v,w) 


w 


: w 7 ^ 0 }, which implies that \\/3\\ < oo. 
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Returning to the proof of the Leibniz Rule, we write out the Taylor estimates for 
/ and g and plug them into (3. If we use the notation A — (. Df) p and B — ( Dg ) p , 
then bilinearity implies 


Pifip + v), g(p + v )) = /3(f(p) + Av + Rf, g{p) + Bv + R g ) 

= Pifip), 9 ip)) + P(Av, g(p)) + Pifip), Bv) 

+ Pif (p), Rg) + Pi-Av, Bv + R g ) + /3{Rf, g(p) + Bv + _R 9 ). 


The last three terms are sublinear. For 


I P(f(p),Rg) 
| (3(Av, Bv + i?^) 

P(Rf, dip) + Bv + Rg) 


< 

p\ 

1 1 fip) 

Rg 



< 

a\ 

IMIII 

v\ \Bv + Rg 


< 


\Rf\ 

\gip) 

H - Bv + Rg 


Therefore / 3(f,g ) is differentiable and D/3(f,g) — /3(Df,g) + (3(f,Dg) as claimed. □ 


Here are some applications of these differentiation rules: 

10 Theorem A function f : U — > R m is differentiable at p G U if and only if 
each of its components fi is differentiable at p. Furthermore, the derivative of its i th 
component is the i th component of the derivative. 

Proof Assume that / is differentiable at p and express the i th component of / as 
fi — 7T if where tyi : W n — > R is the projection that sends a vector w — (w \, . . . , w m ) 
to W{. Since is linear it is differentiable. By the Chain Rule, fi is differentiable at 
p and 

( Dfi) p = (DiTi) O ( Df) p = 7Ti O ( Df ) p . 

The proof of the converse is equally natural. □ 


Theorem 10 implies there is little loss of generality in assuming m— 1, i.e., that 
our functions are real-valued. Multidimensionality of the domain, not the target, is 
what distinguishes multivariable calculus from one- variable calculus. 

11 Mean Value Theorem If f : U — > R m is differentiable on U and the segment 
[p, q\ is contained in U then 


fig ) ~ fip) I < M\q-p 


where M — sup{|| (Df) x 


• ry» 


G U}. 
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Proof Fix any unit vector u G R n . The function 

g(t) = (u, f(p + t(q-p))) 


is differentiable and we can calculate its derivative. By the one-dimensional Mean 
Value Theorem this gives some 6 G (0, 1) such that g( 1) — g(0) = g'{9). That is, 


(u, f(q ) - f{p)) = g\0) 


U . 


(Df)p+0( q - p )(q ~ p)) < M\q - p 


A vector whose dot product with every unit vector is no larger than M\q — p\ has 
norm < M\q — p\. □ 


Remark The one-dimensional Mean Value Theorem is an equality 

/(<?) - f{p) = f'(0)(q-p ) 

and you might expect the same to be true for a vector-valued function if we replace 
f r {6 ) by ( Df)e . Not so. See Exercise 17. The closest we can come to an equality 
form of the multidimensional Mean Value Theorem is the following. 

12 C 1 Mean Value Theorem If f : U R m is of class C 1 (its derivative exists 
and is continuous) and if the segment \p,q] is contained in U then 

(4) f(q) ~ f(p ) = T(q-p) 

where T is the average derivative of f on the segment , 


I 1 



Conversely, if there is a continuous family of linear maps T pq G £ for which (4) holds 
then f is of class C 1 and ( Df) p — T pp . 

Proof The integrand takes values in the normed space £(R n , R m ) and is a continuous 
function of t. The integral is the limit of Riemann sums 

k 

which he in £. Since the integral is an element of £ it has a right to act on the vector 
q — p. Alternatively, if you integrate each entry of the matrix that represents Df 
along the segment then the resulting matrix represents T. Fix an index i and apply 
the Fundamental Theorem of Calculus to the C 1 real-valued function of one variable 


9(t) = fi° °(i) 
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where aft) = p + t(q — p ) parameterizes [p, g]. This gives 

l 


fi(q) - fi{p) = sU) - 5(0) = f g'(t ) 

Jo 


dt 


n 


E 

J = 1 
n n \ 


dfi(cr(t )) 


0 


dxj 


E 

J = 1 


0 


8X4 


( Qj~Pj)dt 


dt(qj - pj ) 


which is the component of T(q — p). 

To check the converse, we assume that (4) holds for a continuous family of linear 
maps T p(? . Take q — p + u. The first-order Taylor remainder at p is 


R(v) = f{p + v) - f(p) - T pp (v) = ( T pq - Tpp)(v) 
which is sublinear with respect to v. Therefore ( Df) p = T pp . 


□ 


13 Corollary Assume that U is connected. Iff-U R m is differentiable and for 
each point x G U we have ( Df) x — 0 then f is constant. 


Proof The enjoyable open and closed argument is left to you as Exercise 20. 


□ 


We conclude this section with another useful rule - differentiation past the 
integral. See also Exercise 23. 

14 Theorem Assume that f : [a, b\ x (c, d) — > R is continuous and that df(x,y)/dy 
exists and is continuous. Then 

>b 

^ 7 dx 

a 

is of class C 1 and 

x 


F(y) = [ f(x,y) 

J a 


( 5 ) 


dF 

dy 


a 


dfjx^y) 

dy 


dx. 


Proof By the C 1 Mean Value Theorem, if h is small then 


F{y ±hpz F (y) 

h 


•b 


1 


9f{x, y + th) 


h 


dt l h dx. 


a 


0 dy 

The inner integral is the partial derivative of / with respect to y averaged along the 
segment from y to y+h. Continuity implies that this average converges to d /(x, y) / dy 
as h -X 0, which verifies (5). Continuity of dF/dy follows from continuity of df /dy. 
See Exercise 22. □ 
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3 Higher Derivatives 


In this section we define higher-order multivariable derivatives. We do so in the same 
spirit as in the previous section - the second derivative will be the derivative of the 
first derivative, viewed naturally. Assume that / : U — > R m is differentiable on U. 
The derivative ( Df) x exists at each x G U and the map x ( Df) x defines a function 

Df : U — > £(R n , R m ). 

The derivative Df is the same sort of thing that / is, namely a function from an 
open subset of a vector space into another vector space. In the case of Df the target 
vector space is not R m but rather the mn-dimensional space £. If Df is differentiable 
at p G U then by definition 

(. D(Df)) p — ( D 2 f) p — the second derivative of / at p 

and / is second-differentiable at p. The second derivative at p is a linear map 
from R n into £. For each v G R n , (D 2 f) p (v) belongs to £ and therefore is a linear 
transformation R n — > R m so (D 2 f) p (v)(w) is bilinear and we write it as 


( D 2 f) p {v,w ). 

(Recall that bilinearity is linearity in each variable separately.) 

Third and higher derivatives are defined in the same way. If / is second-differen- 
tiable on U then x t-G ( D 2 f) x defines a map 

D 2 f :U L 2 


where L 2 is the vector space of bilinear maps R n x R n — > R m . If D 2 f is differentiable 
at p then / is third-differentiable there, and its third derivative is the trilinear map 
( D 3 f) p = ( D(D 2 f)) p . And so on. 

Just as for first derivatives, the relation between the second derivative and the 
second partial derivatives calls for thought. Express / : U — > R m in component form 
as f[x) — (/i(x), . . . , fm(x)) where x varies in U. 

15 Theorem If(D 2 f) p exists then (D 2 fj^) p exists , the second partials atp exist, and 


( D 2 f k ) p (ei,ej ) = 


d 2 f k {p) 


dxjdx 


3 


Conversely, existence of the second partials implies existence of(D 2 f) p , provided that 
the second partials exist at all points x G U near p and are continuous at p. 
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Proof Assume that ( D 2 f) p exists. Then x ( Df) x is differentiable at x 
the same is true of the matrix 



p and 


that represents it; x M x is differentiable at x — p. For according to Theorem 10, a 
vector function is differentiable if and only if its components are differentiable, and 
then the derivative of the component is the k th component of the derivative. A 
matrix is a special type of vector. Its components are its entries. Thus the entries 
of M x are differentiable at x — p and the second partials exist. Furthermore, the k th 
row of M x is a differentiable vector function of x at x — p and 


{D{Df k )) p {ei){ej) 


(D 2 fk) p (ei, ej) = lim 


0 


(D fk)p+tej( e j) (Dfk)p( e j) 

t 


The first derivatives appearing in this fraction are the j th partials of /& at p + tei and 
at p. Thus d 2 fk{p) /dxidxj — (D 2 // C ) p (e^, ej) as claimed. 

Conversely, assume that the second partials exist at all x near p and are continuous 
at p. Then the entries of M x have partials that exist at all points q near p, and are 
continuous at p. Theorem 8 implies that x M x is differentiable at x — p; i.e., / is 
second-differentiable at p. □ 


The most important and surprising property of second derivatives is symmetry. 
16 Theorem If (D 2 f) p exists then it is symmetric: For all v,w G M n we have 

( D 2 f) p (v,w ) = (. D 2 f) p (w,v ). 

Proof We will assume that / is real- valued (i.e., m — 1) because the symmetry 
assertion concerns the arguments of / rather than its values. For a variable t G [0, 1] 
we draw the parallelogram P determined by the vectors tv, tw and label the vertices 
with ±1 as in Figure 109. 

The quantity 

A = A (£,u, w) = /(p + tv + tw) - f(p + tv) - f(p + tw) + f(p) 

is the signed sum of / at the vertices of P. Clearly A is symmetric with respect to 
v, w, 

A (£, v, w) — A (£, w, v). 
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p + rw 


p + tv + tw 



Figure 109 The parallelogram P has signed vertices. 


We claim that 

fr \ fr> 2t\ ( \ t &(t,V,w) 

(6) (D f) p (v, w) = lim - 2 . 

from which symmetry of D-f follows. 

Fix t, v, w and write A = g( 1) — g(0) where 

g(s ) = f(p + tv + stw ) — f(p + stw). 

Since / is differentiable, so is g. By the one-dimensional Mean Value Theorem there 
exists 6 G (0, 1) with A = g r {6). By the Chain Rule g'(0) can be written in terms of 
Df and we get 


A = g'(0) = ( Df) p+tv+etw (tw ) - (. Df) p+etw (tw ). 


Taylor’s estimate applied to the differentiable function u i— > ( Df) u at u — p gives 


(Df) p+X = (Df) p + (. D 2 f) p (x , . ) + R{x , . ) 


where R(x, . ) € is sublinear with respect to x. Writing out this estimate 

for ( Df) p+X first with x — tv + Qtw and then with x — 6tw gives 


A 

*2 


\Df) P (w) + (D 2 f) p (tv + Qtw , w) + R(tv + Qtw , w) 

f) p (w) + (D 2 f) p (Qtw, w) + R(Qtw , w )] } 

R(tv + Qtw, w) R(Qtw,w ) 

f)p(v,w) + 
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Bilinearity was used to combine the two second derivative terms. Sublinearity of 
R(x, w) with respect to x implies that the last two terms tend to 0 as t — > 0, which 
completes the proof of (6). Since ( D 2 f) p is the limit of a symmetric (although 
nonlinear) function of v, w it too is symmetric. □ 


Remark The fact that D 2 f can be expressed directly as a limit of values of / is 
itself interesting. It should remind you of its one-dimensional counterpart, 



lim 

h—> 0 


f(x + h) + f{x -h)- 2 fjx) 

h 2 


17 Corollary Corresponding mixed second partials of a second- differentiable func- 
tion are equal, 

d 2 fk(p ) = d 2 f k {p) 
dxidxj dxjdxi 


Proof The equalities 


d 2 fk{p) 

dxidxj 


(- D 2 f k )p(e i ,e j ) = (D 2 f k ) p (e j ,e i ) 


d 2 fk{p) 

dxjdxi 


follow from Theorem 15 and the symmetry of D 2 f . 


□ 


The mere existence of the second-order partials does not imply second order 
differentiability, nor does it imply equality of corresponding mixed second partials. 
See Exercise 24. 

18 Corollary The r th derivative, if it exists, is symmetric: Permutation of the vec- 
tors v \, . . . , v r does not affect the value of (D r f) p (v i, . . . , v r ). Corresponding mixed 
higher-order partials are equal. 

Proof The induction argument is left to you as Exercise 29. □ 


In my opinion Theorem 16 is quite natural even though its proof is tricky. It 
proceeds from a pointwise hypothesis to a pointwise conclusion - whenever the second 
derivative exists it is symmetric. No assumption is made about continuity of partials. 
It is possible that / is second-differentiable at p and nowhere else. See Exercise 25. 
All the same, it remains standard to prove equality of mixed partials under stronger 
hypotheses, namely, that D 2 f is continuous. See Exercise 27. 

We conclude this section with a brief discussion of the rules of higher-order dif- 
ferentiation. It is simple to check that the r th derivative of / + eg is D r f + cD r g. 
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Also, if (3 is fc-linear and k < r then f{x) — / 3 {pc , . . . , x) has D r f — 0. On the other 
hand, if k — r then (D r f) p = r! Symm(/3) where Symm(/3) is the symmetrization of 
f3. See Exercise 28. 

The Chain Rule for r th derivatives is a bit complicated. The difficulties arise from 
the fact that x appears in two places in the expression for the first-order Chain Rule, 
(D(g o f)) x = (Dg)f^ o (D/%, and so, differentiating this product produces 

(■ D 2 g) f(x) o ( Df) 2 x + ( Dg) f{x) o (D 2 f) x . 

(The meaning of ( Df) x needs clarification.) Differentiating again produces four 
terms, two of which combine. The general formula is 

r 

(D r (g o f)) x = ^ f[x) o (D»f) x 

k = 1 M 

where the sum on fi is taken as /i runs through all partitions of {1, . . . , r} into k 
disjoint subsets. See Exercise 41. 

The higher-order Leibniz rule is left for you as Exercise 42. 


Smoothness Classes 

A map f : U —> R m is of class C r if it is r ' “-order differentiable at each p G U and 
its derivatives depend continuously on p. (Since differentiability implies continuity, 
all the derivatives of order less than r are automatically continuous. Only the r th 
derivative is in question.) If / is of class C r for all r then it is smooth or of class 
C°°. According to the differentiation rules, these smoothness classes are closed under 
the operations of linear combination, product, and composition. We discuss next how 
they are closed under limits. 

Let (/*.) be a sequence of C r functions fk-U — > R m . The sequence is 
(a) Uniformly C r convergent if for some C r function f : U R m we have 


fk^f Df k =4 Df 


D r f k =4 D r f 


as k — > oo. 

(b) Uniformly C r Cauchy if for each e > 0 there is an N such that for all 


k,£ > N and all x E U we have 


fk(x ) - fi(x ) I < e II (Df k ) x - (Dfe) x || < e . . . \\{D r f k ) x - ( D r f e ) x 


< 6 . 
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19 Theorem Uniform C r convergence and Cauchyness are equivalent. 

Proof Convergence always implies the Cauchy condition. As for the converse, first 
assume that r — 1. We know that f k converges uniformly to a continuous function / 
and the derivative sequence converges uniformly to a continuous limit 

Dfk =4 G. 

We claim that Df — G. Fix p G U and consider points q in a small convex neigh- 
borhood of p. The C 1 Mean Value Theorem and uniform convergence imply that as 
fc — > oo we have 

fk(q) - fkip ) = [ ( Df k ) p+t{q _ p) dt (q - p ) 

JO 

f(q)~f(p ) = [ G(p + t(q — p))dt{q — p). 

Jo 

This integral of G is a continuous function of q that reduces to G(p) when p — q. By 
the converse part of the C 1 Mean Value Theorem, / is differentiable and Df — G. 
Therefore / is C 1 and fk converges C 1 uniformly to / as k — > oo, completing the 
proof when r — 1. 

Now suppose that r > 2. The maps Df \ '• U — > Ju form a uniformly C r_1 Cauchy 
sequence. The limit, by induction, is C r_1 uniform; i.e., as k oo we have 

D\Df k )^D s G 

for all s < r — 1. Hence f k converges C r uniformly to / as k — >> oo, completing the 
induction. □ 


The C r norm of a C r function / : U — > R m is 


ll/llr- = max{sup |/(x)|, . . . , sup ||(D r /) a: ||}. 

xGt/ x^U 

The set of functions with \\f\\ r < oo is denoted C r (U, M m ). 


20 Corollary 

space. 


makes C r (U , R m ) a Banach space - a complete normed vector 


Proof The norm properties are easy to check; completeness follows from Theo- 
rem 19. □ 
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21 C r M - test If M k is a convergent series of constants and if \\fk\\r < Mk for all 
k then the series of functions fk converges in C r {U , R m ) to a function f . Term-by- 
term differentiation of order < r is valid, i.e., for all s < r we have D s f — D s fk- 

Proof Obvious from the preceding corollary. □ 


4 Implicit and Inverse Functions 

Let / : U R m be given, where U is an open subset of R n x R m . Fix attention on 
a point (xo,yo) €= U and write /(#o>2/o) — zq. Our goal is to solve the equation 

(7) f{x, y) — zq 

near (xo,yo)- More precisely, we hope to show that the set of points (x,y) near 
(xo,yo) at which f(x,y) — zo, the so-called zo-locus of /, is the graph of a function 
y — g(x). If so, g is the implicit function defined by (7). See Figure 110. 


R m 



Figure 110 Near (#o,yo) the zo-locus of / is the graph of a function 

y = g(x). 

Under various hypotheses we will show that g exists, is unique, and is differen- 
tiable. The main assumption, which we make throughout this section, is that 

dfi{x 0 ,yo) 


the m x rn matrix B — 


d Vj 


is invertible. 


Equivalently the linear transformation that B represents is an isomorphism R 

R m . 


m 
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22 Implicit Function Theorem If the function f above is C r , 1 < r < oo, then 
near (xo,yo), the zq-Iocus of f is the graph of a unique function y — g(x). Besides, 
g is C r . 


Proof Without loss of generality we suppose that (xo,yo) is the origin in R n x R m 
and zq — 0 in R m . The Taylor expression for / is 

f(x,y) = Ax + By + R 


where A is the m x n matrix 

. = \dfi(x 0 ,yoY 

[ d Xj _ 

and R is sublinear. Solving /(x, y) — 0 for y — gx is equivalent to solving 
(8) y = -B~ l {Ax + R(x , y)). 


In the unlikely event that R does not depend on y , (8) is an explicit formula for 
gx and the implicit function is an explicit function. In general, the idea is that the 
remainder R depends so weakly on y that we can switch it to the left-hand side of 
(8), absorbing it in the y- term. 

Solving (8) for y as a function of x is the same as finding a fixed-point of 

K x : y >— ?► -B~ l (Ax + R(x,y)), 


so we hope to show that K x contracts. 
(DR)( o ? o) — 0- Therefore if r is small and 


The remainder R is a C 1 function, and 
x\, \y\ < r then 


1 3- 1 

dR(x , y) 


dy 



By the Mean Value Theorem this implies that 


Kx{yi) Kx{y2)\ ^ 

< 


B~ l 

B~ l 


R(x, yi) - R(x, y 2 )\ 


dR 

dy 


\yi - V2\ < Zyi - y2 


for |x|, \yi\, |t/ 2 | ^ Due to continuity at the origin, if \x\ < r <C r then 


r 


K x { 0 )| < -• 


Thus, for each x G X, K x contracts Y into itself where X is the r-neighborhood of 0 
in R n and Y is the closure of the r-neighborhood of 0 in R m . See Figure 111. 
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Figure 111 K x contracts Y into itself. 

By the Contraction Mapping Principle, K x has a unique fixed point g{x) in Y . 
This implies that near the origin, the zero locus of / is the graph of a function 

y = g(x). 

It remains to check that g is C r . First we show that g obeys a Lipschitz condition 
at 0. We have 


\gx\ = \K x (gx) - K x (0) + K x (0)\ < Lip(K x ) \gx - 0| + \K X (0) 


where L — B 


—l 


< ^- + \B~ 1 (Ax + R(x,0))\ < ^- + 2L 


2 

A || and 


x 


x 


is small. Thus g satisfies the Lipschitz condition 


\gx\ < 4 L 


x 


In particular g is continuous at x — 0. 

Note the trick here. The term \gx\ appears on both sides of the inequality but 
since its coefficient on the r.h.s. is smaller than that on the l.h.s., they combine to 
give a nontrivial inequality. 

By the Chain Rule, the derivative of g at the origin, if it does exist, must satisfy 
A + B(Dg) o = 0, so we aim to show that (Dg) o = —B~ l A. Since gx is a fixed-point 
of K x we have gx — — B~ l A(x + R ) and the Taylor estimate for g at the origin is 


\g(x) -g(0) - (-B l Ax)\ = 

B l R{x^gx)\ < 

5 -1 i 

< 

B~ l \ 

e(x, gx)(\x\ 

+ i^i) 

< 

B - 1 

e(x, gx)( 1 + 4 L)\x\ 


R(x, gx) 
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where e(x, y) — > 0 as (x, y) — > (0, 0). Since gx — >> 0 as x — > 0, the error factor e(x, gx) 
does tend to 0 as x — > 0, the remainder is sublinear with respect to x, and g is 
differentiable at 0 with (Dg ) o = —B~ 1 A. 

All facts proved at the origin hold equally at points (x, y) on the zero locus near 
the origin. For the origin is nothing special. Thus, g is differentiable at x and 
(Dg) x = -B~ l o A x where 

_ df(x,gx ) _ df(x, gx) 

x — o 0 

ox oy 

Since gx is continuous (being differentiable) and / is C 1 , A x and B x are continuous 
functions of x. According to Cramer’s Rule for finding the inverse of a matrix, the 
entries of B~ x are explicit, algebraic functions of the entries of B x , and therefore they 
depend continuously on x. Therefore g is C 1 . 

To complete the proof that g is C r we apply induction. For 2 < r < oo, assume 
the theorem is true for r — 1. When / is C r this implies that g is C r_1 . Because they 
are composites of C r ~ 1 functions, A x and B x are C r ~ 1 . Because the entries of B~ l 
depend algebraically on the entries of B x , B~ 1 is also C r 1 . Therefore ( Dg) x is C r 1 
and g is C r . If / is C 00 , we have just shown that g is C r for all finite r and thus g is 
C°°. □ 


Exercises 35 and 36 discuss the properties of matrix inversion avoiding Cramer’s 
Rule and finite dimensionality. 


Next we are going to deduce the Inverse Function Theorem from the Implicit 
Function Theorem. A fair question is: Since they turn out to be equivalent theorems, 
why not do it the other way around? Well, in my own experience the Implicit Function 
Theorem is more basic and flexible. I have at times needed forms of the Implicit 
Function Theorem with weaker differentiability hypotheses respecting x than y and 
they do not follow from the Inverse Function Theorem. For example, if we merely 
assume that B — df(x o, yo)/dy is invertible, that d /(x, y)/dx is a continuous function 
of (x,y), and that / is continuous (or Lipschitz) then the local implicit function of / 
is continuous (or Lipschitz). It is not necessary to assume that / is of class C 1 . 


Just as a homeomorphism is a continuous bijection whose inverse is continuous, 
so a C r diffeomorphism is a C r bijection whose inverse is C r , 1 < r < oo. The 
inverse being C r is not automatic. The example to remember is /(x) = x 3 . It is a 
C°° bijection M — >> R and is a homeomorphism but not a diffeomorphism because its 
inverse fails to be differentiable at the origin. Since differentiability implies continuity, 
every diffeomorphism is a homeomorphism. 
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Diffeomorphisms are to C r things as isomorphisms are to algebraic things. The 
sphere and ellipsoid are diffeomorphic under a diffeomorphism R 3 — > R 3 but the 
sphere and the surface of the cube are only homeomorphic, not diffeomorphic. 

23 Inverse Function Theorem If the derivative of f is invertible then f is a local 
diffeomorphism. 

Proof Invertibility of a matrix implies the matrix is square, so m — n. Then we 
have /:[/—> R m , where U is an open subset of R m , and at some p G J7, ( Df) p is 
assumed to be invertible. We assume / is C r , 1 < r < oo, and set 

F{x, y) = f(x) - y q = f(p ) 

for (x,y) G U x R m . Clearly F is C r , F(p,q) = 0, and the derivative of F with 
respect to x at (p, q ) is (Df) p . 

Since ( Df) p is an isomorphism we can apply the Implicit Function Theorem (with 
x and y interchanged!) to find neighborhoods U p of p and V q of q and a C r implicit 
function h : V q — > U p uniquely defined by the equation 

F(hy,y ) = f(hy) - y = 0. 

This means that h is a “local right inverse” for / in the sense that / o h — id \y q . 
Since F(p,q) — 0, uniqueness implies p — hq , and ( Df) p o ( Dh) q — I implies (. Dh) q 
is invertible. 

We claim that h is also a “local left inverse” for /, and hence that / is a local 
diffeomorphism. We can apply the same analysis with h in place of / since it is C r , 
it sends q to p, and its derivative at q is invertible. Consequently h has a unique local 
right inverse, say g. It satisfies h o g — id locally and we get 

/ = / ° (h O g) = (/ o h) O g = g. 

Thus ho f — hog — id shows that h is a local left inverse for / and we have h — f~ l 
on a neighborhood of q. □ 


5* The Rank Theorem 

The rank of a linear transformation T : R n — > R m is the dimension of its range. In 
terms of matrices, the rank is the size of the largest minor with nonzero determinant. 
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If T is onto then its rank is m. If it is one-to-one then its rank is n. A standard 
formula in linear algebra states that 

rank T + nullity T — n 

where nullity is the dimension of the kernel of T. A differentiable function / : U R m 
has constant rank k if for all p E U the rank of ( Df) p is k. 

An important property of rank is that if T has rank k and \\S — T\\ is small then 
S has rank > k. The rank of T can increase under a small perturbation of T but it 
cannot decrease. Thus, if / is C 1 and ( Df) p has rank k then automatically ( Df) x 
has rank > k for all x near p. See Exercise 43. 

The Rank Theorem describes maps of constant rank. It says that locally they 
are just like linear projections. To formalize this we say that maps / : A — > B and 
g : C — > D are equivalent (for want of a better word) if there are bijections a : A — > C 
and f3 : B — > D such that g — (3 o / o a~ l . An elegant way to express this equation 

is a commutative diagram 




a 




8 



> D . 


Commutativity means that for each a G A we have / 3(/(a )) = g(a(a)). Following the 
maps around the rectangle clockwise from A to D gives the same result as following 
them around it counterclockwise. The cq (3 are “changes of variable.” If /, g are 
C r and cq/3 are C r diffeomorphisms, 1 < r < oo, then / and g are said to be C r 
equivalent, and we write / g. As C r maps, / and g are indistinguishable. 

24 Lemma C r equivalence is an equivalence relation and it has no effect on rank. 

Proof Since diffeomorphisms form a group, £s r is an equivalence relation. Also, if 
g — j3 o f o a~ l then the chain rule implies 

Dg = DfdoDf oDa~\ 

Since D(3 and Da~ l are isomorphisms, Df and Dg have equal rank. □ 
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The linear projection P : R n — > R m 

P (x • • • , Xji) • • • 5 Xfc^ 0, • • • , 0) 

has rank k. It projects R n onto the fc-dimensional subspace R k x 0 C R m . (We 
assume that k < n, m.) The m x n matrix of P is 

hxk 0 

0 0 

25 Rank Theorem Locally , a C r constant-rank-k map is C r equivalent to a linear 
projection onto a k-dimensional subspace. 


v 


As an example, think of the radial projection n : R 3 \ {0} -0- S' 2 , where n(v) — 
It has constant rank 2, and is locally indistinguishable from linear projection 


v 


of R 3 to the (x, y)-plane. 


Proof Let f : U R m have constant rank k and let p G U be given. We will show 
that on a neighborhood of p we have / £s r P. 

Step 1. Define translations of R n and R m by 


r : R n — ► R n 
z z + p 


r : R m — > R m 

z i — y z — fp. 


The translations are diffeomorphisms of R n and R m and they show that / is C r 
equivalent to r' o / o r, a C r map that sends 0 to 0 and has constant rank k. Thus, 
it is no loss of generality to assume in the first place that p is the origin in R n and 
fp is the origin in R m . We do so. 

Step 2. Let T : R n — > R n be an isomorphism that sends 0 x R n ~ k onto the kernel 
of (Df) o. Since the kernel has dimension n — k, there is such a T. Let T' : R m — > R m 
be an isomorphism that sends the image of (Df) o onto R k x 0. Since (Df) o has rank 
fc, there is such a T' . Then f zz r T' o f o T. This map sends the origin in R n to the 
origin in R m , while its derivative at the origin has kernel 0 x R n k and range R fc x 0. 
Thus it is no loss of generality to assume in the first place that / has these properties. 
We do so. 

Step 3. Write 


(x,y) e R k x R n ~ k 


f(x,y ) = (fx{x,y), fr(x,y )) e R k x k 


304 


Multivariable Calculus 


Chapter 5 


We are going to find a g f such that 


g(x, o) = (x,o) 


The matrix of (Df ) o is 


A 0 
0 0 


where A is k x k and invertible. By the Inverse Function Theorem the map 

a : x fx(%, 0) 

is a diffeomorphism a : X — > X' where X and X' are small neighborhoods of the 
origin in M. k and fx denotes the hrst k components of /. For x' G X 7 , set 

h(x') = f Y (cr~ 1 (x'), 0). 


This makes h a C r map X' — > W 71 k , and 


h{?{x)) = /y(x, 0) 

where fy denotes the final m — k components of /. The image of X x 0 under / is 
the graph of h. For 

f(X x 0) = {/O, 0) : x G X} = {(fx(x, 0), /y(x, 0)):xGl} 

= {(/x(o' _1 (a; , ) ) 0), /y(cr _1 (x'),0)) : a/ E X'} 

= {(V, h(x')) : x 7 G X 7 }. 

See Figure 112. 



Figure 112 The /-image of X x 0 is the graph of h. 
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If (V, y ') G X' x R m k then we define 

VKX?/) = (> _ 1 (y)> 2/ - MX))- 

Since ^ is the composite of C r diffeomorphisms, 

O',?/) (a;', y' - h(x')) ^ (cr _1 (x'), y' - h(x')), 

it too is a C r diffeomorphism. (Alternatively, you could compute the derivative of tJj at 
the origin and apply the Inverse Function Theorem.) We observe that g — ip o f f 
satisfies 


g(x, 0) = Tpo (fx(x,0), f Y (x,0)) 

= o -1 ° fx(x, 0), fy{x, 0) - h(fx(x, 0))) = (x, 0). 

Thus it is no loss of generality to assume in the first place that /(x, 0) = (x, 0). We do 
so. (This means that / sends the fc-plane l fc x0 C R n into the fc-plane R fc x 0 C R m .) 

Step 4. Finally, we find a local diffeomorphism p> in the neighborhood of 0 in R n 
so that / o (p is the projection map P(x, y) — (x, 0). 

Dehne x, y) = fx(^y) ~ x. It is a map from R fc x R fc x MJ l ~ k into R fc . The 
equation 

F(£,,x,y) = 0 

defines £ = £(x,y) implicitly in a neighborhood of the origin. For at the origin the 
derivative of F with respect to £ is the invertible matrix Ikxk- Thus ^ is a C r map 
from R n into R fc and £(0,0) = 0. We claim that 

y>{x,y) = {£(x,y),y) 

is a local diffeomorphism of R n and G — f o (p is P. 

The derivative of £(x, y) with respect to x at the origin can be calculated from the 
Chain Rule (this was done in general for implicit functions) and since T(£,x,?/) = 0 
we have 

n _ dF(£(x,y), x, y) _ dF d£ t dF _ T d£ T 

t' j o /■ o o kxk r\ -*■ k x k ' 

ax ot, ox ox ox 

That is, at the origin d£/dx is the identity matrix. Thus, 

hxk * 

0 I(n—k)x(n—k) 


(D(p) o 
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which is invertible no matter what * is. Clearly ^(0) = 0. By the Inverse Function 
Theorem, p is a local C r diffeomorphism on a neighborhood of the origin and G is 
C r equivalent to /. By Lemma 24, G has constant rank k. 

We have 


G(x,y) 


fotf(x,y) = f(£(x,y), y) 

(fx(Z,y), friCy)) = (x, G Y (x,y)). 


Therefore Gx{x,y) — x and 


DG 



* 


0 

dG Y 

dy 


At last we use the constant-rank hypothesis. (Until now, it has been enough that 
Df has rank > k.) The only way that a matrix of this form can have rank k is that 

dGy 

See Exercise 43. By Corollary 13 to the Mean Value Theorem this implies that in a 
neighborhood of the origin, Gy is independent of y. Thus 

Gy{x,y) = Gy 0,0) = /y(£O,0),0), 

which is 0 because (by Step 3) /y = 0 on x 0. The upshot is that G ~ r / 
and G(x 1 y) — (x, 0); i.e., G — P. See also Exercise 31. By Lemma24, steps 1-4 
concatenate to give a C r equivalence between the original constant-rank map / and 
the linear projection P. □ 



In the following three corollaries U is an open subset of R n . 

26 Corollary If f : U — > R m has rank k at p then it is locally C r equivalent to a 
map of the form G(x, y) — (x, g(x, y)) where g : R n — > W 7l ~ k is C r and x G R fc . 

Proof This was shown in the proof of the Rank Theorem before we used the as- 
sumption that / has constant-rank k. □ 

27 Corollary If f : U — > R is C r and ( Df) p has rank 1 then in a neighborhood of 
p the level sets {x G U : f{pc) — c} form a stack of C r nonlinear discs of dimension 

n — 1. 


Section 5* 


The Rank Theorem 


307 



Figure 113 Near a rank-one point, the level sets of / : U — > R are 
diffeomorphic to a stack of (n — l)-dimensional planes. 


Proof Near p the rank can not decrease, so / has constant rank 1 near p. The level 
sets of a projection R n — > R form a stack of (n — l)-dimensional planes and the level 
sets of / are the images of these planes under the equivalence diffeomorphism in the 
Rank Theorem. See Figure 113. □ 


28 Corollary If f : U — > R m has rank n at p then locally the image of U under f is 
a diffeomorphic copy of the n-dimensional disc. 


Proof Near p the rank can not decrease, so / has constant rank n near p. The Rank 
Theorem says that / is locally C r equivalent to x 4 (x,0). (Since k — n, the y- 
coordinates are absent.) Thus the local image of U is diffeomorphic to a neighborhood 
of 0 in R n x 0 which is an n-dimensional disc. □ 


The geometric meaning of the diffeomorphisms f and p is illustrated in the Fig- 
ures 114 and 115. 
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Figure 114 / has constant rank 1. 
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Figure 115 / has constant rank 2 
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6* Lagrange Multipliers 

In sophomore calculus you learn how to maximize a function f(x,y,z) subject to 
a “constraint” or “side condition” g(x,y,z) — constant by the Lagrange multiplier 
method. Namely, the maximum can occur only at a point p where the gradient of / 
is a scalar multiple of the gradient of g , 

grad p / = A grad p g. 

The factor A is the Lagrange multiplier. The goal of this section is a natural, math- 
ematically complete explanation of the Lagrange multiplier method which amounts 
to gazing at the right picture. 

First, the natural hypotheses are 

(a) / and g are C 1 real- valued functions defined on some region U C M 3 . 

(b) For some constant c, the set S = g pre (c) is compact, nonempty, and grad^g ^ 0 
for all q G S. 

The conclusion is 

(c) The restriction of / to the set 5 , f\s, has a maximum, say M, and if p E S has 
f(p) — M then there is a A such that grad p / = A grad^g. 

The method is utilized as follows. You are given^ / and <7, and you are asked to 
find a point p G S at which f\$ is maximum. Compactness implies that a maximum 
point exists. Your job is to find it. You first locate all points q G S at which the 
gradients of / and g are linearly dependent; i.e. , one gradient is a scalar multiple of 
the other. They are “candidates” for the maximum point. You then evaluate / at 
each candidate and the one with the largest /-value is the maximum. Done. 

Of course you can find the minimum the same way. It too will be among the 
candidates, and it will have the smallest /-value. In fact, the candidates are exactly 
the critical points of /|#, the points x G S such that 

f y ~ l X , n 

y - x 


as y G S tends to x. 

^Sometimes you are merely given / and S. Then you must think up an appropriate g such that 
(b) is true. 
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Now we explain why the Lagrange multiplier method works. Recall that the 
gradient of a function h(x, y, z) at p G U is the vector 


grad p h — 


dhjp) dhjp) dh(p) \ 3 

dx ’ Oy ’ dz ) 


Assume hypotheses (a), (b) and that f\$ attains its maximum value M at p G S. We 
must prove (c) - the gradient of / at p is a scalar multiple of the gradient of g at p. 
If grad p / = 0 then grad / = 0 • grad^p, which verifies (c) degenerately. Thus it is 
fair to assume that grad p / ^ 0. 

By the Rank Theorem, in the neighborhood of a point at which the gradient of 
/ is nonzero, the /-level surfaces are like a stack of pancakes. (The pancakes are 
infinitely thin and may be somewhat curved. Alternatively, you can picture the level 
surfaces as layers of an onion skin or as a pile of transparency foils.) 

To arrive at a contradiction, assume that grad p / is not a scalar multiple of grad p g. 
The angle between the gradients is nonzero. Gaze at the /-level surfaces / = M ± e 
for e small. The way these /-level surfaces meet the p- level surface S is shown in 
Figure 116. 

The surface S' is a knife blade that slices through the /-pancakes. The knife 
blade is perpendicular to grad p, while the pancakes are perpendicular to grad /. 
There is a positive angle between these gradient vectors, so the knife is not tangent 
to the pancakes. Rather, S slices transversely through each /-level surface near p, 
and S FI {/ = M + e} is a curve that passes near p. The value of / on this curve is 
M + e, which contradicts the assumption that f\$ attains a maximum at p. Therefore 
grad p / is, after all, a scalar multiple of grad^p and the proof of (c) is complete. 

There is a higher-dimensional version of the Lagrange multiplier method. A C 1 
function / : U — )> R is defined on an open set U C R n , and it is constrained to a 
compact “surface” S C U defined by k simultaneous equations 


pl(xi, • • • , Xrt) M 

• • • 

• • • 5 -Eri) — Qc* 

We assume the functions gi are C 1 and their gradients are linearly independent. 
The higher-dimensional Lagrange multiplier method asserts that if f\s achieves a 
maximum at p then grad / is a linear combination of grad^pi, . . . ,grad p g&. In 
contrast to Protter and Morrey’s presentation on pages 369-372 of their book, A 
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Figure 116 S cuts through all the /-level surfaces near p. 


First Course in Real Analysis , the proof is utterly simple: It amounts to examining 
the situation in the right coordinate system at p. 

It is no loss of generality to assume that p is the origin in R n and that ci, . . . , c&, 
f(p) are zero. Also, we can assume that grad / ^ 0, since otherwise it is already a 
trivial linear combination of the gradients of the g{. Then choose vectors wr+2, • • • , w n 
so that 

grad 0 gi,..., grad 0 g k , grad 0 f, w k+ 2 , . . . , w n 
is a vector basis of R n . For k + 2 < i < n define 

hi(x) = (wi,x). 

The map x > F(x ) = (g\(x), . . . , g k (x), f(x), h k + 2 ( x )i ■ ■ ■ , h n (x)) is a local diffeo- 
morphism of R n to itself since the derivative of F at the origin is the n x n matrix of 
linearly independent column vectors 

(DF ) o = [ grad 0 gi . . . grad 0 g k grad 0 / w k+2 ... w n ] . 

Think of the functions yi — Fi(x) as new coordinates on a neighborhood of the 
origin in R n . With respect to these coordinates, the surface S is the coordinate plane 
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0 x W n ~ k on which the coordinates yi, . . . , j/fc are zero and / is the (fc+ l) st coordinate 
function y^+i- This coordinate function obviously does not attain a maximum on the 
coordinate plane 0 x R n_/c , so f\$ attains no maximum at p. 


7 Multiple Integrals 


In this section we generalize to n variables the one- variable Riemann integration the- 
ory appearing in Chapter 3. For simplicity, we assume throughout that the function 
/ we integrate is real-valued, as contrasted to vector- valued, and at first we assume 
that / is a function of only two variables. 

Consider a rectangle R — [a, b] x [c, d\ in R 2 . Partitions P and Q of [a, b] and [c, d\ 


P : a — x o < xi < ... < x m — b Q \ c — y^ < y\ <...< y n — d 
give rise to a “grid” G — P x Q of rectangles 


R^ j — 7 x J 


j 


where 7* = [xi-i,Xi\ and Jj = [yj-i,yj\. Let A Xi = xt - i, A yj = yj - yj_ i, and 


denote the area of R ^ as 




Rij — ^ Vj • 


J 


Let S' be a choice of sample points {sijpij) G Rij. See Figure 117. 
Given / : R — > R, the corresponding Riemann sum is 


m n 


R(f, G, S ) = 



f (Sij , tjj ) Rij 


i = 1 3 = 1 


If there is a number to which the Riemann sums converge as the mesh of the grid 
(the diameter of the largest rectangle) tends to zero then / is Riemann integrable 
and that number is the Riemann integral 


[ f= lim R(f,G,S). 

J ft meshG— ^0 


The lower and upper sums of a bounded function / with respect to the grid G 


are 


L(f, G) = J2 I U(f, G) = J2 MijRij 
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Figure 117 A grid and a sample point 


where nriij and M\j are the infimum and supremum of f(s,t) as (s,£) varies over Rij. 
The lower integral is the supremum of the lower sums and the upper integral is the 
infimum of the upper sums. 

The proofs of the following facts are conceptually identical to the one-dimensional 
versions explained in Chapter 3: 

(a) If / is Riemann integrable then it is bounded. 

(b) The set of Riemann integrable functions R — > R is a vector space 3? — 3?(i?) 
and integration is a linear map > R. 

(c) The constant function f — k is integrable and its integral is k\R\. 

(d) If /, g G DJ and / < g then 



(e) Every lower sum is less than or equal to every upper sum, and consequently 
the lower integral is no greater than the upper integral, 



< 



(f) For a bounded function, Riemann integrability is equivalent to the equality of 
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the lower and upper integrals, and integrability implies equality of the lower, 
upper, and Riemann integrals. 

The Riemann-Lebesgue Theorem is another result that generalizes naturally to 
multiple integrals. It states that a bounded function is Riemann integrable if and 
only if its discontinuities form a zero set. 

First of all, Z C R 2 is a zero set if for each e > 0 there is a countable covering 
of Z by open rectangles Sa whose total area is less than e: 

EN <e - 

t 

By the e/2^ construction, a countable union of zero sets is a zero set. 

As in dimension 1, we express the discontinuity set of our function / : R -G M as 
the union 

D= U D k , 

km 

where D k is the set of points z <E R at which the oscillation is > 1/k. (See Exer- 
cise 3.19.) That is, 

osc z f — lim diam (f(R r (z))) > 

r— 

where R r (z) is the r-neighborhood of z in R. The set D & is compact. 

Assume that / : R -G R. is Riemann integrable. It is bounded and its upper and 
lower integrals are equal. Fix k G N. Given e > 0, there exists S > 0 such that if G 
is a grid with mesh < S then 



U(f,G)-L(f,G)<e. 


Fix such a grid G. Each R{j in the grid that contains in its interior a point of D has 
Mij — rriij > 1/k, where rriij and M{j are the infimum and supremum of / on Rij. 
The other points of D & he in the zero set of gridlines x [c, d\ and [a, b } x yj. Since 
U — L < c, the total area of these rectangles with oscillation > 1/k does not exceed 
ke. Since k is fixed and e is arbitrary, Dk is a zero set. Taking k = 1 , 2 ,... shows 
that the discontinuity set D — [ J Dk is a zero set. 

Conversely, assume that / is bounded and D is a zero set. Fix any k G N. Each 
zeR\ Dk has a neighborhood W — W z such that 


sup {f(w) : w G W } — inf {f(w) : w G W } < 1/k. 
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Since is a zero set, it can be covered by countably many open rectangles S i of 
small total area, say 

£ I'S'd < cr. 

Let V be the covering of R by the neighborhoods W with small oscillation, and the 
rectangles S^. Since R is compact, V has a positive Lebesgue number A. Take a grid 
with mesh < A. This breaks the sum 

U - L = £(M ? , - mij)\Rij\ 

into two parts - the sum of those terms for which Rij is contained in a neighborhood 
W with small oscillation, plus a sum of terms for which R^ is contained in one of 
the rectangles Sj>. The latter sum is less than 2Mcr, while the former is less than 
\R\/k. Thus, when k is large and a is small, U — L is small, which implies Riemann 
integr ability. To summarize, 


The Riemann-Lebesgue Theorem remains valid 
for functions of several variables. 


Now we come to the first place that multiple integration has something new to 
say. Suppose that / : R — > R is bounded and define 


b 


b 


F{y) = f{x,y)dx F(y) — f(x, y) dx. 


a 


a 


For each fixed y G [c, d \ , these are the lower and upper integrals of the single- variable 
function f y : [a, b] — > R defined by f y {x) — f(x,y). They are the integrals of f(x,y) 
on the slice y — const. See Figure 118. 


29 Fubini’s Theorem If f is Riemann integrable then so are F_ and F. Moreover , 




F dy. 


Since F_ < F and the integral of their difference is zero, it follows from the one- 
dimensional Riemann-Lebesgue Theorem that there exists a linear zero set Y C [c, d\ 
such that if y 0 Y then F(y) — F(y). That is, the integral of f(x,y) with respect 
to x exists for almost all y and we get the more common way to write the Fubini 
formula 



>d r 



f(x, y) dx 
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Figure 118 Fubinrs Theorem is like sliced bread. 


There is, however, an ambiguity in this formula. What is the value of the integrand 
f /(x, y ) dx when y G Y1 For such a y, F_(y) < F{y ) and the integral of /(x, y) with 
respect to x does not exist. The answer is that we can choose any value between F_{y) 
and F(y). The integral with respect to y will be unaffected. See also Exercise 47. 


Proof of Fubini’s Theorem We claim that if P and Q are partitions of [a, b] and 
c, d\ then 



L(f,G) < L(F,Q) 


where G is the grid P x Q. 


'i x any partition interval Jj C fc, d}. If y E Jj then 


rriij = inf {f(s,t) : ( s,t ) € Rij} < inf {f(s,y) : s G /J = rrii(fy). 

Thus 

m m 

^ rriij Axi < ^rni{f y )^Xi = L(f y ,P) < F(y), 

i = 1 z=l 

and it follows that 

rn 

Axi < rn j (F) . 

2=1 
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Therefore 

n m n 

PP mijAxiAyj < ’^^m j (F)Ay j = L(F, Q) 

3 = 1 i=1 i=i 

which gives (9). Analogously, U(F,Q) < U(f,G). Thus 

L(f,G) < L(F, Q) < U(F,Q) < U(F,Q ) < G(/,G). 

Since / is integrable, the outer terms of this inequality differ by arbitrarily little when 
the mesh of G is small. Taking infima and suprema over all grids G — P x Q gives 

[ f = sup L(/, G) < supL(F, Q) < inf[/(F, Q) 

Jr 

< mf U(f,G) = [ /. 

Jr 

The resulting equality of these five quantities implies that F is integrable and its 
integral on [c, d] equals that of / on R. The case of the upper integral is handled in 
the same way. □ 


30 Corollary If f is Riemann integrable then the order of integration - first x then 
y or vice versa - is irrelevant to the value of the iterated integral , 


rd 

- rb 

rb 

r rd 1 

L 

/ f(x,y)dx 

J a 

dy = 

J a 

i 

At 

o 

1 


Proof Both iterated integrals equal the integral of / over R. □ 


A geometric consequence of Fubini’s Theorem concerns the calculation of the area 
of plane regions by a slice method. Corresponding slice methods are valid in 3-space 
and in higher dimensions. 

31 Cavalieri’s Principle The area of a region S C R is the integral with respect to 
x of the length of its vertical slices, 

f b 

area(iS) = / length (5^) dx , 

J a 

provided that the boundary of S is a zero set. 


Proof Deriving Cavalierrs Principle from Fubini’s Theorem is mainly a matter of 
definition. For we define the length of a subset of R and the area of a subset of R 2 to 
be the integrals of their characteristic functions. The requirement that dS is a zero 
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set is made so that Xs is Riemann integrable. It is met if S has a smooth, or piecewise 
smooth, boundary. See Appendix B for a delightful discussion of the historical origin 
of Cavalierrs Principle, and see Chapter 6 for the more general geometric definition 
of length and area in terms of outer measure. □ 


The second new aspect of multiple integration concerns the change of variables 
formula. It is the higher-dimensional version of integration by substitution. We will 
suppose that p : U — > W is a C 1 diffeomorphism between open subsets of R 2 , that 
Ref/, and that a Riemann integrable function / : W R is given. The Jacobian 
of (p at z G U is the determinant of the derivative, 

Jac z p — det (Dp) z . 


32 Change of Variables Formula Under the preceding assumptions we have 



Jac p 



See Figure 119. 



Figure 119 p is a change of variables. 

If S' is a bounded subset of R 2 , its area (or Jordan content) is by definition the 
integral of its characteristic function Xsi if the integral exists. When the integral does 
exist we say that S is Riemann measurable. See also Appendix D of Chapter 6. 
According to the Riemann-Lebesgue Theorem, S is Riemann measurable if and only 
if its boundary is a zero set. For Xs is discontinuous at z if and only if z is a boundary 
point of S. See Exercise 44. The characteristic function of a rectangle R is Riemann 
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integrable and its integral is |i?|, so we are justified in using the same notation for 
area of a general set S, namely, 


|S| = area(S) 



33 Proposition If T : R 2 — > R 2 is an isomorphism then for every Riemann mea- 
surable set S C R 2 , T(S) is Riemann measurable and 


I T(S) 


det Tils'!. 


Proposition 33 is a version of the Change of Variables Formula in which p — T, 
R — S, and f — 1. It remains true for n-dimensional volume and leads to a definition 
of the determinant of a linear transformation as a “volume multiplier.” 

Proof As is shown in linear algebra, the matrix A that represents T is a product of 
elementary matrices 

A — Ex'-E]^. 

Each elementary 2x2 matrix is one of the following types: 


o 

1 


1 0 


0 1 


i 

b 

T— 1 

0 1 


o 


1 0 


0 1 


where A > 0. The first three matrices represent isomorphisms whose effect on 1 2 is 
obvious: 1 2 is converted to the rectangles A / x /, / x A/, and I 2 . In each case the 
area agrees with the magnitude of the determinant. The fourth matrix is a shear 
matrix. Its isomorphism converts 1 2 to the parallelogram 

n = {(x,y) G R 2 : ay < x < 1 + cry and 0 < y < 1}. 


II is Riemann measurable since its boundary is a zero set. By Fubini’s Theorem, we 
get 



f f 1 

rx=l+cry 

n 

= Xu = 

/ 1 dx 


J Jo 

x=cry 



1 — det E. 


Exactly the same thinking shows that for any rectangle R, not merely the unit square, 
we have 



E(R) | = | det E\\R 


We claim that (10) implies that for any Riemann measurable set S, E(S) is Riemann 
measurable and 



E(S) | = | det^||S 
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Let e > 0 be given. Choose a grid G on R D S with mesh so small that the 


rectangles R of G satisfy 

\S\-e < J2\ R 

RcS 


( 12 ) 


< E i^i - i 5 i + e - 

The interiors of the inner rectangles - those with R C S - are disjoint, and therefore 
for each z E R 2 we have 

^ ^ Tint r (%) — Xs( z )' 

RCS 

The same is true after we apply E, namely 

E Xmt(E(R))( z ) < Xe(S)( z )- 

RCS 

Linearity and monotonicity of the integral, and Riemann measurability of the sets 
E(R) imply that 


(13) 


Similarly, 


E i £ ( K >i = E / Xint (E(R)) ~ Ej / ^int(£(i?)) < I Xe(S)- 

RcS RcS ' RcS — — 



z ) < Ej Xe(r)( z ) 
r nS^0 


which implies that 


( 14 ) f Xe(S) < E [ Xe (R) ~ E 

J J RnS^9 

By (10) and (12), (13) and (14) become 

det£|(|S| -e) < | det £| |i? 

RCS 


Xe(r) — El 

R nS^0 


< J 'xe(S) — J Xe(S) < I det E\ Ej \R 


Rns^tt 


< | det T|(|S| + e). 


Since these upper and lower integrals do not depend on e and e is arbitrarily small, 
they equal the common value | det E\ ,3' | , which completes the proof of (11). 

The determinant of a matrix product is the product of the determinants. Since 
the matrix of T is the product of elementary matrices, E\ • • ■ E^, (11) implies that if 
S is Riemann measurable then so is T(S) and 


\T(S)\ = \Ei---E k (S)\ 

det E\ I • • • I det E k S 


det Tils’!. 


□ 
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We isolate two more facts in preparation for the proof of the Change of Variables 
Formula. 


34 Lemma Suppose that if : U — > R 2 is C 1 , 0 G U , -0(0) = 0 ; and for all u G U we 
have 

|| (Dfj) u - Id\\ < e. 


IfU r { 0) C U then 


fj(U r (0) ) C E/ (1+e)r ( 0). 


Proof By U r (p ) we denote the r-neighborhood of p in U. The C 1 Mean Value 
Theorem gives 

V’H = ^(u)-^(0) = [ (D'ip)tu dt (u) 

Jo 


•l 


{{Dif) tu - id) dt (u) + u. 


o 


If 


u 


< 


r this implies that \^{u)\ < (1 + e)r; i.e., ^(^(O)) C C/( 1+e ) r ( 0) 


□ 


Lemma 34 is valid for any choice of norm on R 2 , in particular for the maximum 
coordinate norm. In that case the inclusion refers to squares: the square of radius r 
is carried by if inside the square of radius (1 + e)r. 

35 Lemma The Lipschitz image of a zero set is a zero set. 

Proof Suppose that Z is a zero set and h : Z — >► R 2 satisfies a Lipschitz condition 


| h(z) — h(z') | < L 


z — z 


Given e > 0, there is a countable covering of Z by squares Sk such that 


Ei^i 


< e. 


k 


See Exercise 45. Each set Sk Z has diameter < diam Sk and therefore h(Z n Sk) 
has diameter < L diam Sk- As such it is contained in a square S' k of edge length 
Ldi&mSk • The squares S' k cover h(Z) and 

E \Sk\ < L 2 E(dia m ^) 2 = 2i 2 E l 5 *l ^ ^ 


k 


k 


k 


Therefore h(Z) is a zero set. 


□ 
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Proof of the Change of Variables Formula Recall that p : U W is a C 1 

diffeomorphism, / : W — > R is Riemann integrable, R is a rectangle in [/, and it is 
asserted that 




Jac p 



Let D' be the set of discontinuity points of /. It is a zero set. Then 


D = v-\D>) 


is the set of discontinuity points of / o ip. The C 1 Mean Value Theorem implies that 
( p _1 is Lipschitz, Lemma 35 implies that D is a zero set, and the Riemann-Lebesgue 
Theorem implies that / o ip is Riemann integrable. Since |Jac(^| is continuous, it is 
Riemann integrable and so is the product / o ip • |Jac(^|. In short, the l.h.s. of (15) 
makes sense. 


Since (p is a diffeomorphism, it is a homeomorphism and it carries the boundary 
of R to the boundary of p(R)- The former boundary is a zero set and by Lemma 35 so 
is the latter. Thus X^r) is Riemann integrable. Choose a rectangle R' that contains 
p(R). Then the r.h.s. of (15) becomes 

/ /= / f-X v (R), 

J^{R) JR' 

which also makes sense. It remains to show that the two sides of (15) not only make 
sense but are equal. 

Equip R 2 with the maximum coordinate norm and equip £(R 2 ,R 2 ) with the 
associated operator norm 


T || — max{|T(r>) 


max 


V 


max 


<!}• 


Let 6 0 be given. Take any grid G that partitions R into squares Rij of radius 

r. (The smallness of r will be specified below.) Let zp be the center point of Rp and 

call 

— (D(p} Zi j p(zij) — Wij tp(Rij) ~ kVj . 

The Taylor approximation to ip on Rp is 

The composite ^ o p sends zp to itself and its derivative at zp is the 

identity transformation. Uniform continuity of ( Dp) z on R implies that if r is small 
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enough then for all z G Rij and for all ij we have || (D^i) z — id 
we have 


< e. By Lemma 34 


(16) </>;/ O <p(Rij) C (1 + e)Rij 

where (1 +e)Rij refers to the (l+e)-dilation of Rij centered at Zij. Similarly, Lemma 34 
applies to the composite o faj and, taking the radius r/( 1 + e) instead of r, we 
get 


(17) ip 1 o (f>ij((l + e) 1 Rij)cRij. 

See Figure 120. Then (16) and (17) imply 



Figure 120 How we magnify the picture and sandwich a nonlinear 

parallelogram between two linear ones 


^j((l + e ) C (p (Rij) — Wij C (f>ij((l + e)Rij) 


By Proposition 33 this gives the area estimate 


J ij | Rij 


< \Wij\ < (1 + e) 2 Jij \Rij 


where Jij — | Jac^. <p\. Equivalently 


(18) 


(1 + e) 


2 ^ 





Jij 

Rij 


< (l + ^) : 


An estimate of the form 


(l + ^) : 


a 


< - b < d+ E ) 
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with 0 and n, 6 0 implies that 


a — b < 16 eb 


as you are left to check in Exercise 40. Thus (18) implies 


(19) 


fhij J ij Rij 


< 16 eJ R 




where J — sup{| Jac z ip\ : z E R}. 

Let rriij and be the infimum and supremum of / o on R ij. Then, for all 


w E ip(R) we have 


^rnjjXmtWiiiw) < f(w) < ^ MjjXwg (w) 


which integrates to 


E 


rriij 


Wi 


ij 


< f f < Vm zj | Wij 


According to (19), replacing | Wij | by Jij\Rij | causes an error of no more than 16e J|i7 


^3 


Thus 


rriij Jjj \R%j \ ~ 16eMJ \R\< f < Mij Rj \Rij \ + 16eMJ | R 
^ Jv(R) ^ 


<p(R) 

where M — sup|/|. These are lower and upper sums for the integrable function 
/ o (f • | Jac (f\. Thus 


/ o tp • \ Jac cp\ — 16eMJ |i?| < / / < / / o cp • \ Jac tp\ + 16eMJ | R 

r Jr 

Since e is arbitrarily small the proof is complete. 


□ 


Finally, here is a sketch of the n-dimensional theory. Instead of a two-dimensional 
rectangle we have a box 

R — [ui, b\\ x • • • x c bji\. 


Riemann sums of a function / : R — > M are defined as before: Take a grid G of small 
boxes Rr in R, take a sample point S£ in each, and set 


R(f,G,S) = Y,f(st)\R- 


t 


where \R#\ is the product of the edge lengths of the small box R £ and S is the set of 
sample points. If the Riemann sums converge to a limit it is the integral. The general 
theory, including the Riemann-Lebesgue Theorem, is the same as in dimension 2. 


326 


Multivariable Calculus 


Chapter 5 


Fubini’s Theorem is proved by induction on n, and has the same meaning: In- 
tegration on a box can be done slice by slice, and the order in which the iterated 
integration is performed has no effect on the answer. 

The Change of Variables Formula has the same statement, only now the Jacobian 
is the determinant of an n x n matrix. In place of area we have volume, the 77,- 
dimensional volume of a set S C R n being the integral of its characteristic function. 
The volume-multiplier formula, Proposition 33, has essentially the same proof but the 
elementary matrix notation is messier. (It helps to realize that the following types of 
elementary row operations suffice for row reduction: Transposition of two adjacent 
rows, multiplication of the first row by A, and addition of the second row to the first.) 
The proof of the Change of Variables Formula itself differs only in that 16 becomes 

4 n 


8 Differential Forms 

The Riemann integral notation 

n 

y^J(tj)Axj « / f(x)dx 

i = 1 Ja 

may lead one to imagine the integral as an “infinite sum of infinitely small quantities 
f(x)dx Although this idea itself seems to lead nowhere, it points to a good ques- 
tion - how do you give an independent meaning to the symbol / dxl The answer: 
differential forms. Not only does the theory of differential forms supply coherent, 
independent meanings for f dx, dx, dy , d/, dxdy , and even for d and x separately, 
but it also unifies vector calculus results. A single result, the General Stokes Formula 
for differential forms 

I duj = / L J, 

JM JdM 

encapsulates all integral theorems about divergence, gradient, and curl. 

The presentation of differential forms in this section appears in the natural gener- 
ality of n dimensions, and as a consequence it is unavoidably fraught with complicated 
index notation - armies of V s, j’s, double subscripts, multi-indices, and so on. Your 
endurance may be tried. 

First, consider a function y — F(x). Normally, you think of F as the function, 
x as the input variable, and y as the output variable. But you can also take a dual 
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approach and think of x as the function, F as the input variable, and y as the output 
variable. After all, why not? It’s a kind of mathematical yin/yang. 

Now consider a path integral the way it is defined in calculus, 

J fdx + gdy = j f(x(t),y(t))^^-dt + j g(x(t),y(t)) C fjf- dt. 

f and g are smooth real- valued functions of (x, y) and C is a smooth path param- 
eterized by as t varies on [0, 1]. Normally you think of the integral as a 

number that depends on the functions / and g. Taking the dual approach you can 
think of it as a number that depends on the path C. This will be our point of view. 
It parallels that found in Rudin’s Principles of Mathematical Analysis. 

Definition A differential 1-form is a function that sends paths to real numbers 
and which can be expressed as a path integral in the previous notation. The name 
of this particular differential 1-form is f dx + gdy 


In a way, this definition begs the question. For it simply says that the standard 
calculus formula for path integrals should be read in a new way - as a function of the 
integration domain. Doing so, however, is illuminating, for it leads you to ask: Just 
what property of C does the differential 1-form f dx P gdy measure? 


First take the case that /(x,y) = 1 and g(x,y) — 0. Then the path integral is 

' b dx{t) 


dx — 


C 


a 


dt 


dt — x(h) — x{a) 


which is the “net x-variation” of the path C. This can be written in functional 
notation as 

dx : C i-o- x(b) — x(a). 


It means that dx assigns to each path C its net x-variation. Similarly dy assigns to 
each path its net ^-variation. The word “net” is important. Negative x-variation 
cancels positive x-variation, and negative ^-variation cancels positive ^-variation. In 
the world of forms, orientation matters. 


What about f dx? The function / “weights” x-variation. If the path C passes 
through a region in which / is large, its x-variation is magnified accordingly, and the 
integral f c fdx reflects the net /-weighted x-variation of C. In functional notation 


/ dx : C i y net /-weighted x-variation of C . 


Similarly, gdy assigns to a path its net g- weighted y- variation, and the 1-form f dx + 
g dy assigns to C the sum of the two variations. 
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Terminology A functional on a set A is a function from X to R. 


Figure 121 suggests why / y dx is positive and / y dx is negative: The weight 

Jc Jc 

factor is positive on C and negative on C' . On the other hand, if the weight factor is 
the constant c then both integrals are c(q — p ). 



Figure 121 C and C' are paths from p to q where p and q lie on the 

x-axis. The integrals / y dx and / y dx express the net y- weighted 

Jc JC' 

x- variation along C and C' . 

Differential 1-forms are functionals on the set of paths. Some functionals on the 
set of paths are differential forms but others are not. For instance, assigning to each 
path its arclength is a functional that is not a form. For if C is a path parameterized 
by (x(£), ?/(£)) then (x*(£), y*(t)) — (. x(a + b — t),y(a + b — t)) parameterizes C in 
the reverse direction. Arclength is unaffected but the value of every 1-form on the 
path changes sign. Hence, arclength is not a 1-form. A more trivial example is the 
functional that assigns to each path the number 1. It too fails to have the right 
symmetry property under parameter reversal and is not a 1-form. 

Definition A fc-cell in R n is a smooth map p : I k — > R n where I k is the unit fc-cube. 
If k — 1 then p is a path. The set of fc-cells is ^(R 72 ). 


A fc-cell p need not be a diffeomorphism to its image, p can be noninjective 
and its derivative can have zero determinant at many points. For this reason cells 
are often called “singular cells.” Singularities are permitted. For example, if e is 
the smooth function that is e -1//t for t > 0 and identically zero for t < 0 then 
t (e (| t — 1/2 1 ) 2 , e( 1 1 — 1/2 1 )) is a smooth 1-cell in the plane, despite the fact that 
its image has a cusp at the origin. See Figure 122. 
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Figure 122 This smooth 1-cell is a path with a cusp. It is part of the 

graph of y = y/\x 


This flexibility is a good thing. It lets the closed disc and many other planar 
regions be (the images of) 2-cells. See page 354, Figure 130, and Exercise 70. 

Integrating a fc-form over a fc-cell p with k > 2 requires Jacobian determinants. 
To simplify notation we write I — {i i, . . . ,z&) and J — (jq, . . . , j^) for fc-tuples of 
integers. Then dpi/duj is the k x k determinant 

dpp 
du h 

^£ik 

du 3k . 

If I — ( i ) and J — (j) then dpj/duj is just dpi/duj , while if I — (1, 2) and J — (5, 7) 
then dpi/duj is the 2x2 determinant 


duj d(u 5 ,u 7 ) 


Notation The letters s, t, and u = (u\, . . . ,Uk) will denote, as often as possible, 
dummy integration variables. They label points in the domain of definition of a k- 
cell, namely I k . For instance 1 2 = {(s, £) : 0 < s, £ < 1}. The letters 
will be used to name forms in the target space R n of the cells. For example dx\dx 5 
is a 2- form in R n with n > 5. In K 3 we will name forms with x,y,z variables. For 
example dxdy is a 2- form in R 3 . It is the same as dx\dx 2 but dxdy is a more familiar 
name for it. A planar path p is p(t) — (pi(t), p 2 (t)) — (x{t),y{t)). 


dp 1 

dpi 

du 5 

dui 

dp 2 

9(p 2 

du 5 

du7 



Definition The xi - area of p is the functional on C^R 77 ), the set of fc-cells. 

dpi 


dxj : p 


Ik 


du 


du 
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where I — (i i, . . . , z&), ( pj — (pi ± , . . . , Pi k ), and the integral notation is shorthand for 



For 1-forms the definition is nothing new. The integral of dx on the path p(t) — 
(x{t),y{t)) is the integral of the lxl Jacobian dx(t)/dt, namely 

j' dt = x{ i) - x{0 ) 

which is the net x-variation of ip. In the xj - area terminology it is the x-area of ip. 

Just as for paths, xj - area can be positive or negative. It is the signed area of 
the shadow of (p on the xj- plane, i.e., the signed area of its projection nj((p(I k )). 
After all, the Jacobian can be negative and it only involves the I - components of ip. 
No components <pj with j £ I appear in dpi/ du. See Figure 123. 



Figure 123 A pseudopod emerging from a rectangle. It is a 2-cell p in R 3 

that casts a shadow in the xy- plane. 

If / is a smooth function on R n then fdxj is the functional 

fdxr.ip J k f( i P(u))^- du. 
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The function / “weights” x/-area. The functional dxj is a basic fc-form and / dxj 
is a simple fc-form, while a sum of simple fc-forms is a general fc-form: 

U) ^ ^2fidx! : if 

I I 


The careful reader will detect some abuse of notation. Here I is used to index 
a collection of scalar coefficient functions {//}, whereas I is also used to reduce an 
m- vector (F\, . . . , F m ) to a k - vector Fj — (F) 15 . . . , Fj jk ). Besides this, I is the unit 
interval. Please persevere. 

To underline the fact that a form is an integral we write 





UJ. 


Notation Cfc( R n ) is the set of all fc-cells in R n , C k ( R n ) is the set of all functionals 
on C/e(R n ), and fi fc (R n ) is the set of /c- forms on R n . 


Because a determinant changes sign under a row transposition, fc-forms satisfy 
the signed commutativity property: If tt permutes I to ttI then 

dx^i — sgn(7 r)dxj 

where sgn(7r) is the sign of the permutation 7r. In particular, dxp^) = —dx^ 2j i) 
signihes that xy - area is the negative of yx- area, that is dxdy — — dydx , a formula 
that is certainly familiar from Sophomore Calculus. Because a determinant is zero if 
it has a repeated row, dxj = 0 if I has a repeated entry. In particular dxdx is the 
zero functional on C^R 2 ). 

Upshot The integral of the basic 2-form dxdy over a 2-cell (p in R 3 is the net area of 
its shadow on the xy- plane. (“Net” means negative area cancels positive area.) The 
same holds for the other coordinate planes and in higher dimensions - net shadow 
area equals the integral of the basic form. 


Example Consider a 2-cell (p : I 2 — > R 3 . What is its xy- area? By definition it is the 
integral of the Jacobian ^ 2 )/ 9 (s, t) over the unit square in (s, t)- space. Suppose 

that (p is given by the formula 


(s, t(l — ms),t) 

(s, t(l — m + ms),f) 


if 0 < s < 1/2 
if 1/2 < s < 1. 
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(f is only piecewise smooth but never mind. If the slope m is 4 then the signed xy- area 
of p is zero. If m > 4 it is negative. 

1 2 has four edges, p sends the bottom edge to itself by the identity map, it sends 
the top edge to the piecewise linear C-shaped path in the plane z — 1 from (0, 1, 1) 
to (1/2, 1 — m/2, 1) to (1, 1, 1). Finally p sends the left and right edges to lines of 
slope 1 that join (0,0,0) to (0,1,1) and (1,0,0) to (1,1,1). Figure 124 shows the 
projection of the cell on the xy-plane. 



Figure 124 np fixes all points of the square’s lower edge, left edge, and 
right edge. It sends the upper edge to the ^-shaped path from (0, 1) to 
(1, 1). For fixed s, i rp(s,t) is affine in t. Positive shadow area is lightly 
shaded and negative shadow area heavily shaded. The total signed xy - area 
of p is negative when m > 4. When m > 2 the cell p resembles a ship’s 

prow. 


Form Naturality 

It is a common error to confuse a cell, which a smooth mapping, with its image, 
which is point set - but the error is fairly harmless. 

36 Theorem Integrating a k-form over k-cells that differ by a reparameterization 
produces the same answer up to a factor o/ ± 1, and this factor o/ ± 1 is determined 
by whether the reparameterization preserves or reverses orientation. 
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Proof If T is an orientation-preserving diffeomorphism of I k to itself then the Jaco- 
bian dT/du is positive. The product determinant formula and the change of variables 
formula for multiple integrals applied to lj — fdxj give 



f{<poT(u)) du 

’ Jk OU 


I k 


f(ipoT(u)) 


d<pi 

dv 


v=T(u) 


dT 

du 


du 


.j Mv » ? fo da = 1 u - 




Taking sums shows that the equation J^ oT lj = f cj continues to hold for all lj G fl k . 
If T reverses orientation, its Jacobian is negative. In the change of variables formula 
appears the absolute value of the Jacobian, which causes f^ oT JJ to change sign. □ 


A particular case of the previous theorem concerns line integrals in the plane. 
The integral of a 1-form over a curve C does not depend on how C is parameterized. 
If we first parameterize C using a parameter t G [0, 1] and then reparameterize it by 
arclength s G [0, L\ where L is the length of C and the orientation of C remains the 
same then integrals of 1-forms are unaffected, 

f(x(s),y(s))^f-ds 
g(x(t),y(t)) d ^~ dt = g(x(s),y(s)) < t^f- ds. 

Form Names 

A fc-tuple I — (A, . . . , ik) ascends if i\ < • • • < i^. 

37 Proposition Each k-form c j has a unique expression as a sum of simple k-forms 
with ascending k-tuple indices , 

UJ = J2fAdx A . 

Moreover, the coefficient /a (A) in this u ascending presentation” of lj is determined 
by the value of lj on small k- cells at x. 

Proof Every fc-tuple of distinct indices has a unique ascending rearrangement. The 
other fc-tuples correspond to the zero fc-form. Using the signed commutativity prop- 
erty of forms, we regroup and combine a sum of simple forms into terms in which the 
indices ascend. This gives the existence of an ascending presentation lj — ^ /a^A- 
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Fix an ascending fc-tuple A and fix a point x G R n . For r > 0 consider the 

inclusion cell, 

i — L r , x : u i— > x + rL(u ) 

where L is the linear inclusion map that sends R fc to the x ,4-plane. i sends I k to a 
cube in the x^-plane at x. As r -A 0, the cube shrinks to x. If I ascends then the 
Jacobian of i is 

dii f r k if I — A 
~du ~ | 0 if / / A. 

Thus, if I 7^ A then fidxj(i) — 0 and 


uj(i) = f A dx A {i) =r k [ f A {i{u)) 

Jlk 


du . 


Continuity of /n implies that 


(20) 


f A (x) = lim ^u(l) 


r—t() V 


which is how the value of uo on small fc-cells at x determines the coefficient /^(x). □ 
38 Corollary If k > n then I} k (R n ) — 0. 


Proof There are no ascending fc-tuples of integers in {1, . . . , n}. 


□ 


Moral A form may have many names, but it has a unique ascending name. Therefore 
if definitions or properties of a form are to be discussed in terms of a form’s name 
then the use of ascending names avoids ambiguity. 


Wedge Products 

Let a be a fc-form and /3 be an Aform. Write them in their ascending presentations, 
a — ajdxi and /3 = ^ j bjdxj. Their wedge product is the (k + f?)-form 

a A /3 = ajbjdxjj 
i,j 

where I = (ii, . . . , i k ), J = IJ = (n, . . . , i k , ji, . . . , je), and the sum 

is taken over all ascending /, J . The use of ascending presentations avoids name 
ambiguity although Theorem 39 makes the ambiguity moot. A particular case of the 
definition is 


dx i A dx 2 = dx( i 5 2)- 
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39 Theorem The wedge product A : x 0^4 £} k + l satisfies four natural condi- 

tions: 

(a) distributivity: (o + /3)A7 = <aA7 + /3A7 and 7 A (o + /3) = 7 A o + 7 A fi. 

(b) insensitivity to presentations: a A (3 — j ajbjdxjj for general presentations 
a = ^2 O'idxj and fi — bjdxj. 

(c) associativity: a A (fi A 7) = (a A /3) A 7- 

(d) signed commutativity : fi A a — ( — 1 ) kk a A fi when a is a k-form and fi is an 
t-form. In particular dx A dy — —dyAdx. 

40 Lemma The wedge product of basic forms satisfies 

dxj A dxj — dxjj. 


Proof ffl See Exercise 55. □ 

Proof If I and J ascend then the lemma merely repeats the definition of the 
wedge product. Otherwise, let n and p be permutations that make nl and pJ non- 
descending. Call g the permutation of IJ that is n on the first k terms and p on the 
last I. The sign of a is sgn(7r) sgn(p) and 

dxj A dxj — sgn(7r) sgn(p) dx^j A dx p j — sgn(cr) dx a (jj^ — dxjj. □ 


Proof of Theorem 39 (a) To check distributivity, suppose that a — ^ ojdxj and 
P = J 2 bi dxj are fc- forms, while 7 = ^2 °jdxj is an Cform and all sums are ascending 
presentations. Then 

+ bj)dxj 

is the ascending presentation of a + fi (this is the only trick in the proof) and 

(a + fi) A 7 = ^(aj + bj)cjdxjj = ^ ajcjdxjj + ^ bjcjdxjj, 

i ,J i,j i,j 

which is a A 7 + fi A 7, and verifies distributivity on the left. Distributivity on the 
right is checked in a similar way. 

(b) Let ^2 O'idxj and bjdxj be general nonascending presentations of a and fi. 
By distributivity and Lemma 40 we have 
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if a 


(c) By (b), to check associativity we need not use ascending presentations. Thus 
= ^2 O'idxj , P — ^2bj dxj , and 1 — ^2 ck dxx then 


<uA(/3AT)— I ^ ^ a i dxj J A £ bjCK dxjK 


I 


J,K 


Y CilbjCK dx 1JK , 
I,J,K 


which equals (a A /?) A 7- 

(d) Associativity implies that it makes sense to write dxj and dxj as products 
dxi x A • • • A dxi k and dxj 1 A • • • A dxj £ . Thus, 

dxj A dxj — dxi ± A • • • A dx{ k A dxj 1 A • • • A . 


It takes k£ pair-transpositions to push each dx{ past each dxj, which implies 

dxjAdxj — (—l) k ^dxjAdxj. 


Distributivity completes the proof of signed commutativity for general a and /3. □ 


The Exterior Derivative 


Differentiating a form is subtle. The idea, as with all derivatives, is to imagine 
how the form changes under small variations of the point at which it is evaluated. 

A 0-form is a smooth function f(x). Its exterior derivative is by definition the 
functional on paths p : [0, 1] -A R n , 

df f(<p( 1)) - /0(°))- 


41 Proposition df is a 1-form; when n — 2 it is expressed as 


df = P dx + A dy. 
ox oy 


In particular, d(x) — dx. 


Proof When no abuse of notation occurs we use calculus shorthand and write f x — 
df /dx, f y — df/dy. Applied to (/?, the form uj — f x dx + f y dy produces the number 


uj(lf) 


o 


1 + > 


dt 


dt J 


dt. 


By the Chain Rule the integrand is the derivative of / o (p(t), so the Fundamental 
Theorem of Calculus implies that oa((p) — f((p(l)) — f(ip(0)). Therefore df = uo as 
claimed. □ 


Section 8 


Differential Forms 


337 


Remark Just as with the 1-form dx, the 1-form df measures the net /-variation of 
a path from p to q. It is the difference fq — fp. 

Definition Fix k > 1. Let fidxj be the ascending presentation of a fc-form uo. 
The exterior derivative of uo is the (h + l)-form 

duo — dfj A dxj. 
i 

The sum is taken over all ascending fc-tuples I. The derivative of uo — fdxj amounts 
to how the coefficient f changes. If / is constant then duo — 0. 

Use of the ascending presentation makes the definition unambiguous although 
Theorem 42 makes this moot. Since dfj is a 1-form and dxj is fc-form, duo is indeed a 
(k + l)-form. For example, we get 

d(fdx + gdy ) = (g x - f y )dx A dy. 

42 Theorem Exterior differentiation d : — > X' +1 satisfies four natural condi- 

tions. 

(a) It is linear: d(a + c/3) = da + cd/3. 

(b) It is insensitive to presentation: If ^ fidxj is a general presentation of uo then 
duo — dfj A dxj . 

(c) It obeys a product rule: If a is a k-form and (3 is an I-form then 

d(a A /?) = da A /3 + { — T) k a A d/3. 

(d) d? — 0. That is, d(duo) = 0 for all uo G fl k . 

Proof (a) Linearity is easy and is left for the reader as Exercise 57. 

(b) Let 7 r make ttI ascending. Linearity of d and associativity of A give 

d(fjdxj) = sgn(7 t) d(fjdx 7r j) = sgn(7r) d(fj) A dx^j = d(fj) A dxj. 

Linearity of d promotes the result from simple forms to general ones. 

(c) The ordinary Leibniz product rule for differentiating functions of two variables 


d{fg) = -ff-dx + ffdy 

ox oy 

= fx9 dx + f y g dy + fg x dx + fg y dy 


gives 
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which is g df + f dg, and verifies (c) for 0-forms in R 2 . The higher-dimensional case 
is similar. Next we consider simple forms a = / dxj and /3 = gdxj. Then 

d(a A f3) — d(fg dxjj) — (gdf + f dg ) A dxjj 

= ( df A dxi) A (g dxj) + (— l) /c (/ dxj) A (dg A dxj) 

= da A f3 + (— l) fc a A d(3. 

Distributivity completes the proof for general and f3. 

The proof of (d) is fun. We check it first for the special 0-form x. By Proposition 41 
the exterior derivative x is dx and in turn the exterior derivative of dx is zero. For 
dx — Idx, dl — 0, and by definition, d(ldx) — d( 1) A dx — 0. For the same reason, 
d(dxj) — 0. 

Next we consider a smooth function / : R 2 — > R and prove that d 2 f — 0. Since 
d 2 x — d 2 y — 0 we have 

d 2 f = d(f x dx + f y dy) = d(f x ) A dx + d(f y ) A dy 

= (fxx dx + f xy dy) A dx + (f yx dx + f yy dy) A dy 
= fxx dx A dx + ( f y X ~ fxy)dx A dy + f yy dy A dy = 0 

since dx A dx — dy A dy — 0 and smoothness of / implies f xy — f yx . 

The fact that d 2 — 0 for functions easily gives the same result for forms. The 
higher-dimensional case is similar. □ 

Pushforward and Pullback 

According to Theorem 36 forms behave naturally under composition on the right. 
What about composition on the left? Let T : R n — > R m be a smooth transformation. 
It induces a natural transformation on fc-cells, T* : Ck( R n ) — > C/e(R m ), called the 
pushforward of T. It is defined as 

T* : cp T o ip. 

A fc-cell ip in R n gets pushed forward to become a fc-cell in R m . Dual to the pushfor- 
ward is the pullback T* : C k (R m ) ->■ C k { R n ). It is defined as 

T* : Y ^ Y o T. 

A functional Y that sends fc-cells in R m to R gets pulled back to become a functional 
on fc-cells in R n , 

T*Y : ip ^ Y(poT). 
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The pushforward T* goes the same direction as T, from R n to R m , while the pullback 
T* goes the opposite way. The pushforward/pullback duality is summarized by the 
formula 

(T*Y)(<p) = Y{T*p). 

C k ( R m ) and C k (R n ) are vector spaces according to the addition and scalar multipli- 
cation rules 

(Y + \W)(<p) = Y(<p) + \W(<p), 

and the pullback T* : C k (R m ) -> C k (R n ) is linear. For if Y, W E C(R ra ), Ael, 
and ip E C&(R n ) then 

(r*(Y + AW))(y>) = (Y + \W)(Totp) = Y(Toip) + \W(Totp) 

= T*Y(ip) + \T*W{ip). 

These functionals Y, W need not be forms - linearity of the pullback has nothing to 
do with forms. The same applies to composition. If T : R n -A R m and S : R m -A R p 
are smooth then 

(S o T)* — T* o S* : C fc (R p ) — > C fe (R n ). 

Although this has nothing to do with forms, Figure 125 is what to remember. 

43 Theorem Pullbacks of forms obey the following three natural conditions. 

(a) The pullback of a form is a form. In particular , T*{dyi) — dTj and T*(/ dyj) = 
T*/ dTj, where dTj = dT A • • • A dTi k . 

(b) The pullback preserves wedge products , T*(o A f3) — T*o A T*/3. 

(c) The pullback commutes with the exterior derivative , dT* = T*d. 

Proof (a) We rely on a nontrivial result in linear algebra, the Cauchy-Binet For- 
mula, which concerns the determinant of a product matrix AB — C, where A is 
k x n and B is n x k. See Appendix E. 

In terms of Jacobians, the Cauchy-Binet Formula asserts that if the maps ip : 
R fc -A R n and : R n -A R fc are smooth then the composite <f> — if o p : R fc -a R fc 
satisfies 

d<f> dif dpj 

du dxj du 

where the Jacobian dif/dxj is evaluated at x — <p(u) and J ranges through all 
ascending fc-tuples in {1, . . . , n}. Then the pullback of a simple fc-form on R m is the 
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Figure 125 fc-cells in R n get pushed forward to R m while fc-forms on R m 
get pulled back to R n . The formula is T*(u)(ip) = 


functional on (C^R 71 ), 


T*(fdy /) : <p f dy T (T o ip) 


t( rr , \\ d ( T 0( P)i i 

f(T o <p(u)) ^ du 

’ jk OU 


Y / f( T ot p( u ^ 

j Jik 


<9T/ 


dpj 


dx jJx= v (u) du 


du. 


(The Cauchy-Binet Formula is used to go from the second to third lines.) This implies 


( 21 ) 


ffZj 


T*{fdyi) = Y^ T *^g^r j dxj 


J 


is a fc-form. O fc (R n ) and fl^R 777 ) are vector subspaces of C k ( R n ) and C k (W m ). Lin- 
earity of T* promotes (21) to general forms, which completes the proof that the 
pullback of a form is a form. Thus T* : fl^(R m ) — > Q k (W 2 ). It remains to check that 
T*{dyi) — dTj. If I — (i i, . . . ,z&) then distributivity of the wedge product and the 
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definition of the exterior derivative of a function imply that 


dTj — dJ A • • • A dTi k 




dT lk 7 

— d t 

dx, Sk 





9T\ 

dx Sk 



A • • • A dx Sk 


The indices A, . . . , ik are fixed. All terms with repeated dummy indices si, . . . , Sk 
are zero, so the sum is really taken as (si, . . . , Sk) varies in the set of fc-tuples with 
no repeated entry, and then we know that (si,...,s&) can be expressed uniquely 
as (si, . . . , Sk) — ttJ for an ascending J — (j i, . . . , jk) and a permutation tt. Also, 
dx Sl A • • • A dx Sk — sgn(7r) dxj. This gives 


dT, = J2 

J 





and hence T*(dyj) — dTj. Here we used the description of the determinant from 
Appendix E. 

(b) For 0-forms it is clear that the pullback of a product is the product of the 
pullbacks, T*(fg) — T*f T*g. Suppose that a is a simple fc-form and /? is a simple 
A form. Then a — f dyi, (3 — gdyj , and a A (3 = fgdyjj. By (a) we get 


T*(a A (3) = T\fg)dTu = T*f T*g dTj A dTj = T*a A T*/3. 


Wedge distributivity and pullback linearity complete the proof of (b). 
(c) If uj is a form of degree 0, uj = / G D°(R m ), then 


T*{df){x) 


T 


* 



9/ 

dyi 


dyi 




m n 



i=l j = 1 


df(y) \ 

) y=T(x) 
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which is merely the Chain Rule expression for d(f o T) — d(T*f), 



Thus, T*duj — dT*ui for 0-forms. 


Next consider a simple A:- form u = f dyj with k > 1. Using (a), the degree-zero 
case, and the wedge differentiation formula, we get 


d(T*u) = d(T*/dT/) 

= d(T*f) A dTi + (— 1 )°T*f A d(dT I ) 
= T*(df) A dTi 
= T*(dfAd yi ) 

= T*(dio). 


Linearity promotes this to general /c-forms and completes the proof of (c). 


□ 


9 The General Stokes Formula 

In this section we establish the general Stokes formula as 


duj — I OJ. 

(p J dcp 

where uo G 0 fc (R n ) and tp G C/ c+ i(R n ). Then, as special cases, we reel off the standard 
formulas of vector calculus. Finally, we discuss antidifferentiation of forms and briefly 
introduce de Rham cohomology. 

First we verify Stokes’ formula on a cube, and then get the general case by means 
of the pullback. 

Definition A fc-chain is a formal linear combination^ that of fc-cells, 

N 

$ = 

3 = 1 

where aj., . . . , o/v are real constants. The integral of a fc-form cu over <f> is 


uj — 


& 


N 

E 

3 = 1 


a,j l u. 


^To be more precise, but no more informative, we form an infinite-dimensional vector space V 


using an uncountable basis consisting of all all fc-cells in R n . Then T = aj^j is a vector in V. 
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Definition The boundary of a fc-cell p is the fc-chain 


fc+i 


dip — Yt-iy+'fr o d’ 1 — ip o d’°) 

3 = 1 


where 


d’ Q : (ui,.... ) Uk) (ui , . . . , Uj-i, 0 , Uj , . . . , Uk) 


t ?' 1 : (ui,...,v,k) ^ (ui,...,Uj-i,l,Uj,...,v,k). 

are the j th “rear inclusion” fc-cell and “front inclusion” fc-cell of I k+1 . See Fig- 
ure 126. (Note that dip is indeed a formal linear combination of (k — l)-cells.) As 


m 

» i-H 

X 

cd 

i 

O 



x-axis 


Figure 126 The rear inclusions i 1,0 and £ 2,0 are the x-rearface and the 
y-rearface. The front inclusion t 3,1 is the z-frontface, the top of the cube. 

shorthand we write dip as 

k + 1 

dip = Y(-l) i+1 8 j 

3 = 1 

where — p o d’ 1 — p o d ,G i s the dipole of p. 

44 Stokes’ Formula for a Cube Assume that k + 1 = n. If uj £ I} k (R n ) and 
i : I n — y R n is the identity-inclusion n-cell in W 1 then 
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Proof Write uo as 


n 


UO — £ fi (x)dx i A • • • A dxi A • • • A dx n , 


i — 1 


where the hat above the term dx^ is standard notation to indicate that dxi is deleted. 
The exterior derivative of uo is 


n 


duo — dfi A dx i A • • • A dx^ A • • • A dx n 


which implies that 


?’=i 

n 


£(-!)• 


dfi 


1 dx i A • • • A dx n 


2=1 

n 


<9x, 


= £(-D‘ 


2=1 


2+1 dfi 

dxi 


dx i A • • • A dx 


22 


[d. = ±(-iy^ f 

' i = 1 J 1 


dh 

jk dxi 


du. 


Deleting the j th component of the rear j th face gives the fc-tuple (ui, . . . , i^), 

while deleting any other component gives a fc-tuple with a component that remains 
constant as u varies. The same is true of the j th front face. Thus the Jacobians are 


1 )/ 


<9u 


<9^ 


1 if I = (1, . . . , j, . . . ,n) 
0 otherwise, 


and so the j th dipole integral of uo is zero except when i — j, and in that case 


oo — 


5o 


I ’’’ I ( f 3 (^1 5***5 H j — 1 5 1 5 5 • • • 5 k ) 

do do 

— /j (i/i, . . . , Uj— i, 0 , u j , . . . , u+)) dtq . . . dufc. 


By the Fundamental Theorem of Calculus we can substitute the integral of a deriva- 
tive for the fj difference; and by Fubini’s Theorem the order of integration in ordinary 
multiple integration is irrelevant. This gives 


+ 


l 


uo — 

5o J 0 


0 


dfj (x) 

dxj 


dx i . . . dx 


22 1 


so the alternating dipole sum ^( — 1) J+1 f SJ uo equals f L duo. 


□ 
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45 Stokes’ Formula for a General fc-cell Assume that k + 1 — n. If u G f^(R n ) 
and if (p G C/ c+ i(R n ) then 

I duj — I uj. 

J <p J d<p 

Proof Using the pullback definition and applying (c) of Theorem 43 when T = (p : 
I k+ 1 — > R n and l : I kJrl — > R fc+1 is the identity-inclusion gives 

I duj — I du — I (p*du — / d(f*u) — / — cj. 

J (p J (po l J l J l J di J dp> C 

Remark The assumption k — n — 1 in Theorem 44 and Corollary 45 makes the 
notation simpler, but the same assertions and proofs are valid for all /c, 0 < k < n — 1. 


Stokes’ Formula on Manifolds 

If M C R n divides into ( k + l)-cells diffeomorphic to I k+1 and its boundary 
divides into fc-cells diffeomorphic to I k as shown in Figure 127, then there is a version 
of Stokes’ Formula for M . Namely, if uj is a fc-form then 




UJ. 


It is required that the boundary fc-cells which are interior to M cancel each other 
out. This prohibits M being the Mobius band and other nonorientable sets. The 
( k + l)-cells “tile” M. 



Figure 127 Manifolds of 2-cells diffeomorphic to I 2 . The boundary of M, 
drawn darker, may have several connected components. 
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Vector Calculus 

The Fundamental Theorem of Calculus can be viewed a special case of Stokes’ 
Formula 

I duj — I uj 

JM JdM 

by taking M — [a, b] C R 1 and u — f . The integral of uj over the 0-chain dM — b — a 
is f(b) — /(a), while the integral of duo over M is J ^ f\x ) dx. Likewise, if / : R 2 — > R 
is smooth then the integral of the 1-form df = f x dx + f y dy is “path independent” in 
the sense that if <£>, if are paths from p to q then 


df= df. 

<p J y 

After all, paths are 1-cells and both integrals equal f(q ) — f(p). The same holds in 
R 3 and R n . 

Second, Green’s Formula in the plane, 


/ / {dx - fy)dxdy = / fdx + gdy, 

J JD JC 


is also a special case when we take uo — f dx + gdy. Here, the region D is bounded 
by the curve C . It is a manifold of 2-cells in the plane. 


Third, the Gauss Divergence Theorem 



div F — 


D 



fluxF, 


s 


is a consequence of Stokes’ Formula. Here, F — (f,g,h) is a smooth vector field 
defined on U C R 3 . (The notation indicates that / is the x-component of F, g is its 
^-component, and h is its z-component.) The divergence of F is the scalar function 

div F — f x + g y + h z . 

If p is a 2-cell in U then the integral 


f dy Adz + g dz A dx + hdx A dy 




is the flux of F across p. Let S' be a compact manifold of 2-cells. The total flux 
across S is the sum of the flux across its 2-cells. If S bounds a region D C U then 
the Gauss Divergence Theorem is just Stokes’ Formula with 

uo — f dy Adz + g dz A dx + hdxAdy. 
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For duj — div F dx A dy A dz. 

Finally, the curl of a vector field F — (/, g, h ) is the vector field 



9z-> 


fz~h 


XI 


9x fy )• 


Applying Stokes’ Formula to the form uj — f dx + g dy + h dz gives 


/ (h y ~ g z ) dy Adz + (f z - h x ) dz A dx + (g x - f y ) dx A dy 

Js 

= / f dx + gdy + hdz 

Jc 

where S' is a surface bounded by the closed curve C. The first integral is the total curl 
across S', while the second is the circulation of F at the boundary. Their equality 
is Stokes’ Curl Theorem. See Corollaries 50 and 51 for further vector calculus 
results. 


Closed Forms and Exact Forms 

A form is closed if its exterior derivative is zero. It is exact if it is the exterior 
derivative of some other form. Since d 2 = 0, every exact form is closed: 


uj — da duo — d(da) — 0. 


When is the converse true? That is, when can we antidifferentiate a closed form 
u and find a such that u — dal If the forms are defined on R n then the answer 
“always” is the Poincare Lemma. See below. But if the forms are defined on some 
subset U of R n , and if they do not extend to smooth forms defined on all of R n , then 
the answer depends on the topology of U . 

There is one case that should be familiar from calculus: Every closed 1-form 
uj — f dx + g dy on R 2 is exact. See Exercise 58. With more work the result holds for 
every U C R n that is simply connected in the sense that each closed curve in U 
can be continuously shrunk to a point in U without leaving U . 

If U C R 2 is not simply connected then there are 1-forms on it that are closed 
but not exact. The standard example is 


where r 2 = 
Exercise 65. 


— y . x 1 
uo — dx H — Kdy 


r d 


T d 


x 2 + y 2 . Its domain of definition is the “punctured plane” R 2 \ {O}. See 
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In R 3 it is instructive to consider the 2-form 


x y z 

uj — ay A az H — ~ dz A ax H — ~ ax A ay. 


r 


r 


r 


u is defined on [/, which is R 3 minus the origin. U is a spherical shell with inner 
radius 0 and outer radius oo. The form cj is closed but not exact despite the fact 
that U is simply connected. See Exercise 59. 


46 Poincare Lemma If lj is a closed k-form on R n then it is exact. 


Proof In fact a better result is true. There are “integration operators” 

L k : n k (R n ) 

with the property that Ld + dL — identity. That is, for all eo G fl k (R n ) we have 

(L k+1 d + dLk)(cu) — lj. 

From the existence of these integration operators, the Poincare Lemma is immediate. 
For if cZu; = 0 then we have 


cj — L(dw) + dL{uj) — dL{u ), 

which shows that uj is exact with antiderivative a — L(uj). 

The construction of L is tricky. First we consider a A:- form /?, not on R n , but on 
R n+1 . It can be expressed uniquely as 

(22) p = ^ fidxi + £ gj dt A dxj 

I J 

where fj = //(x,t), gj — gj{x,t ), and (x,£) G R n+1 = R n x R. The first sum is 
taken over all ascending fc-tuples I in {1, . . . ,n}, and the second over all ascending 
(k — l)-tuples J in {1, . . . , n}. The exterior derivative of (3 is 

X ^ ^ f I 7 7 X ^ ^ f I 7 7 X ^ 

(gQ\ dp — y — — dx£ A dxj + > dt A dxj + > — — dx£ A dt A dxj 
^ oxc dt OXc 

I ,t i J/ 

where I — 1, . . . , n. 

Then we define operators 


N : f2 fe (M n+1 ) Q k ~ 1 ( R n ) 
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by setting 


N W = £ (l gj(x , t) dt^j dxj. 


The operator N only looks at terms of the form in which dt appears. It ignores the 
others. We claim that for all f3 G 0 /c (R n+1 ) we have 


(24) 


(dN + Nd)(f3) = y^(/j(s, 1) - fi(x , 0)) dxj 


7 


where the coefficients fj take their meaning from (22). By Theorem 14 it is legal to 
differentiate past the integral sign. From (23) and the definition of N we get 


N(d/3) — dt ^ dx/ — dt^j dx^ A dxj 

dN(/3 ) = V f dt^) dx^ A dxj. 

y Vdo dx t y 

For the coefficients in N(/3) are independent of t. Therefore 

(dN + Nd)(/3) = y dt^ dxj = y (/j(x, 1) - /j(x, 0))dxj 

as claimed in (24). 

Then we define a cone map p : R n+1 -G R n by 

p(x, t) = tx, 


and set L = N o p*. See Figure 128. Commutativity of pullback and d gives 


(25) 


Ld + dL = Np*d + dtVp* = (tVd + dtV)p 


* 


* 


so it behooves us to work out p*(ca). First suppose that a; is simple, say ca = hdxj G 
ft fc (R n ). Since p(x, t) = (txi, . . . , tx n ) we have 


p*(/idxj) = (p*A)(p*(dx/)) = h(tx)dpj 

— h(tx)(d(txi 1 ) A • • • A d(ix iA .)) 

= /i(tx)((t dxi ± + Xi ± dt) A • • • A ( tdx{ k + Xi k dt )) 

= h(tx)(t k dxj ) + terms that include dt 

where I — {ti, . . . , z&}. From (24) we conclude that 


(tVd + dtV) o p*(/idxj) = (A(lx)l^ — /i(0x)Cfi)dx/ = hdxj 


k 
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(x,y,t) (tx,ty,t) (tx,ty,t) ^ (tx,ty) 

Figure 128 When n + 1 = 3 the cone map sends vertical cylinders to 
vertical cones, which are then projected to the plane. 

and from (25) we get 

(26) ( Ld + dL)(hdxj ) = hdxj. 

The linearity of L and d promote (26) to general fc-forms, 

(Ld + dL)u — u), 

and as remarked at the outset, the existence of such an L implies that closed forms 
on R n are exact. □ 

47 Corollary If U is diffeomorphic to R n then all closed forms on U are exact. 

Proof Let T : U W 1 be a diffeomorphism and assume that u is a closed fc-form 
on U. Set a — (T _1 )*ca. Since pullback commutes with d we see that a is a closed 
fc-form on R n . By the Poincare Lemma there is a (k — l)-form (i on R n with a — dfi. 
Then 

dT*n = T*dn = T*a = T* o (T -1 )* u = ( T~ l o T)*u = id*a; = u 
which shows that u is exact with antiderivative T*/i. □ 

48 Corollary Locally, closed forms defined on open subsets of R n are exact. 

Proof Locally an open subset of R n is diffeomorphic to R n . □ 
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49 Corollary If U C R n is open and starlike (in particular , if U is convex) then 
closed forms on U are exact. 

Proof A starlike set U C R n contains a point p such that the line segment from 
each q G U to p lies in U. Every starlike open set in R n is diffeomorphic to R n . See 
Exercise 52. □ 

50 Corollary A smooth vector field F on R 3 (or on an open set diffeomorphic to 
R 3 y) is the gradient of a scalar function if and only if its curl is everywhere zero. 

Proof If F — grad <f> then 

E — ( y (j) x , (j)y , (j) z ) ^ curl F — (yfzy fyzi f^xz f^zx-) fyx 4*xy) 0* 

On the other hand, if F — (/, p, h) then 

curl F — 0 uj — f dx + g dy T hdz 

is closed and therefore exact. A function f with df> — uj has gradient F. □ 

51 Corollary A smooth vector field on R 3 (or on an open set diffeomorphic to R 3 ) 
has everywhere zero divergence if and only if it is the curl of some other vector field. 

Proof If F — (/, g , h) and G — curlE then 

G — ( hy — g z , f z — h x , g x — ff) 

so the divergence of G is zero. On the other hand, if the divergence of G — (A, E>, C ) 
is zero then the form 

l ' jj — Ady Adz + Bdz A dx + Cdx A dy 

is closed and therefore exact. If the form a — f dx + g dy + h dz has da — uj then 
F — (/, g , h) has curl F — G. □ 


Cohomology 

The set of exact /c- forms on U is usually denoted B k (U ), while the set of closed 
A:- forms is denoted Z k (U). ( a E>” is for boundary and U Z V is for cycle.) Both are 
vector subspaces of £} k (U) and 


B k (U) C Z k {U). 
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The quotient vector space 


H k (U) = Z k {U)/B k (U ) 

is the k th de Rham cohomology group of U . Its members are the “cohomology 
classes” of U. As was discussed above, if U is simply connected then H l (JJ) — 0. 
Also, H 2 {U) 7 ^ 0 when U is the three-dimensional spherical shell. If U is starlike 
then H k (U ) = 0 for all k > 0, and H°(U) — R. Cohomology necessarily reflects the 
global topology of U. For locally, closed forms are exact. The relation between the 
cohomology of U and its topology is the subject of algebraic topology, the basic idea 
being that the more complicated the set U (think of Swiss cheese), the more compli- 
cated is its cohomology, and vice versa. The book From Calculus to Cohomology by 
Madsen and Tomehave provides a beautiful exposition of the subject. 

Differential Forms Viewed Pointwise 

The preceding part of this section presents differential forms as “abstract inte- 
grands” - things which it makes sense to write after an integral sign. But they are 
not defined as functions that have values point by point. Rather they are special 
functionals on the space of cells. This is all well and good since it provides a clean 
path to the main result about forms, the Stokes Formula. 

A different path to Stokes involves multilinear functionals. You have already seen 
bilinear functionals like the dot product. It is a map /3 : R n x R n — > R with various 
properties, the first being that for each v G R n the maps 

w i-g /3(u, w) and w t-G /3(w, v ) 

are linear. We say /? is linear in each vector variable separately. A map /? : R n x 
• • • x R n — > R which is linear in each vector variable separately is a fc-multilinear 
functional. (Its domain is the Cartesian product of k copies R n .) It is alternating 
if for each permutation n of {1 , . . . , k} we have 

/3(ui, . . . , v k ) = sgn(7r)/3(u 7r(1 ), . . . , v^ k) ). 

The set of alternating fc-linear forms is a vector space A k , and one can view u G 
Q k (R n ) at a point p as a member uo p G A k . It is a certain type of tensor that we 
integrate over a cell as p varies in the cell; the vectors on which cj p is evaluated are 
tangent to the cell at p. You can read about this approach to differential forms in 
Michael Spivak’s book Calculus on Manifolds. 
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10* The Brouwer Fixed-Point Theorem 


Let B — B n be the closed unit n-ball. 


B — {x G R n : \x\ < 1}. 


The following is one of the deep results in topology and analysis: 


52 Brouwer Fixed-Point Theorem If F : B 

fixed-point, a point p G B such that F(p) — p. 


B is continuous then it has a 


Proof The proof is relatively short and depends on Stokes’ Theorem. Note that 
Brouwer’s Theorem is trivial when n — 0, for B 0 is a point and is the fixed-point of 
F. Also, if n = 1 then, as observed on page 242, the result is a consequence of the 
Intermediate Value Theorem on B 1 = [—1, 1]. For the continuous function F(x) — x 
is nonnegative at x — — 1 and nonpositive at x — +1, so at some p G [—1,1] we have 
F{p) — p — 0; i.e. , F(p) — p. 


The strategy of the proof in higher dimensions is to suppose that there does exist 
a continuous F : B — > B which fails to have a fixed-point, and from this supposition 
to derive a contradiction, namely that the volume of B is zero. The first step in the 
proof is standard. 


Step 1. The existence of a continuous F : B — > B without a fixed-point implies 
the existence of a smooth retraction T of a neighborhood U of B to dB. The map 
T sends U to dB and fixes every point of dB. 


If F has no fixed-point as x varies in B , then compactness of B implies there is 
some /a > 0 such that for all x G B we have 


F{x) 



> fi. 


The Stone- Weierstrass Theorem then produces a multivariable polynomial F : R n — > 
R n that /i/2-approximates F on B. The map 


G(x) 


1 + /i/2 


F{x) 


is smooth and sends B into the interior of B. It //-approximates F on B, so it too 
has no fixed-point. The restriction of G to a small neighborhood U of B also sends 
U into B and has no fixed-point. 

Figure 129 shows how to construct the retraction T from the map G. Since G is 
smooth, so is T. 
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Figure 129 T retracts U onto dB. The point u G U is sent by T to the 
unique point u! — T(u ) at which the segment [u, G(u)\, extended through u, 

crosses the sphere dB. 


Step 2. T* kills all n-forms. If there is a point p G U such that ( DT) p is invertible 
then the Inverse Function Theorem implies TU contains an open n-dimensional ball 
at fp. Since no such ball is contained in dB — TU , DT is nowhere invertible, its 
Jacobian determinant dT/du is everywhere zero, and T* : O n (R n ) — > Q n (U) is the 
zero map. 

Step 3. There is a map cp : I n — > B that exhibits B as an n-cell such that 


(a) cp is smooth. 

(b) cp(I n ) = B and ip(dl n ) = dB. 

(c) [ ^ du > 0. 

J Jn dU 

To construct (p, start with a smooth function a : M — >> R such that a(r) = 0 for 
r < 1/2, a'(r) > 0 for 1/2 < r < 1, and cr(r) = 1 for r > 1. Then define ^ : R n — > R n 
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v + cr(\v\) 

0 



if v 7 ^ 0 



See Figure 130 and Exercise 53. Since cr(H) = 0 when |r>| < 1 / 2 , iJj is smooth. 



Figure 130 The map ^ crushes all of R n onto the closed unit ball B n . It is 
a diffeomorphism of the interior of B n to itself, and fixes each point of 
QB n — S' 71-1 . Its derivative has rank n — 1 at each point of W 1 \ int B n . 
Restricted to each ( n — l)-dimensional face E of the cube [— 1 , l] n , ^ is a 
diffeomorphism from the interior of E to one of the 2 n open cubical polar 

caps on S n_1 . See also Figure 131 and Exercise 52. 


The map ^ carries the sphere S r of radius r to the sphere of radius 


p{r ) = r + cr(r)( 1 — r), 


sending each radial line into itself. Set tp — tpoK where k, scales I n to 
affine map k : u i— > v — (2u\ — 1, . . . , 2 u n — 1). Then 


— 1 , l] n by the 


(i) p is smooth since ^ and ft are smooth. 

(ii) p sends dl n to dB since ^ sends d{[— 1, l] n ) to dB. 

(iii) It is left as Exercise 70 to show that the Jacobian of ^ is p / (r)p(r) n_1 /r n_1 when 
r = |r>|. Thus, the Jacobian dp/du is always nonnegative, and is identically 
equal to 2 n on the ball of radius 1/4 at the center of 7 n , so its integral on I n is 
positive. 


Step 4. Consider an (n — l)-form a. 


If (3 \ I n 1 — > W 1 is an [n — l)-cell whose 
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Figure 131 There are six polar caps at the six poles of the 2-sphere. 


image lies in dB then 

I a — I a — / T*<a 

Jp JTo/3 J (5 

since T is the identity map on dB. The (n — l)-dimensional faces of p : I n — > B he 
in dB. Thus 


(27) a = T * a - 

J dtp J dip 


Step 5. Now we get the contradiction. Consider the specific (n — l)-form 


a = x\ dx 2 A • • • A dx n . 


Note that da — dx\ A • • • A dx n is n-dimensional volume and 


da — 




dip 

du 


du > 0. 


Jn 
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In fact the integral is the volume of B. However, we also have 




by Stokes’ Theorem on a cell 

by Equation (27) 

by Stokes’ Theorem on a cell 

by (d) in Theorem 43 
by Step 2. 


This is a contradiction - an integral can not simultaneously be zero and positive. The 
assumption that there exists a continuous F : B -A B with no fixed-point has led to 
a contradiction. Therefore it is untenable and every F does have a fixed-point. □ 


Appendix A Perorations of Dieudonne 

In his classic book, Foundations of Analysis , Jean Dieudonne of the 

French Bourbaki school writes 

“The subject matter of this Chapter [Chapter VIII on differential calculus] 
is nothing else but the elementary theorems of Calculus, which however 
are presented in a way which will probably be new to most students. That 
presentation which throughout adheres strictly to our general ‘geometric’ 
outlook on Analysis, aims at keeping as close as possible to the fundamen- 
tal idea of Calculus, namely the local approximation of functions by linear 
functions. In the classical teaching of Calculus, this idea is immediately 
obscured by the accidental fact that, on a one-dimensional vector space, 
there is a one-to-one correspondence between linear forms and numbers, 
and therefore the derivative at a point is defined as a number instead of 
a linear form. This slavish subservience to the shibboleth of numerical 
interpretation at any cost becomes much worse when dealing with func- 
tions of several variables: One thus arrives, for instance, at the classical 
formula”... “giving the partial derivatives of a composite function, which 
has lost any trace of intuitive meaning, whereas the natural statement of 
the theorem is of course that the (total) derivative of a composite function 
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is the composite of their derivatives” “a very sensible formulation when 
one thinks in terms of linear approximation.” 

“This ‘intrinsic’ formulation of Calculus, due to its greater ‘abstraction’, 
and in particular to the fact that again and again, one has to leave the 
initial spaces and climb higher and higher to new ‘function spaces’ (es- 
pecially when dealing with the theory of higher derivatives), certainly 
requires some mental effort, contrasting with the comfortable routine of 
the classical formulas. But we believe the result is well worth the labor, 
as it will prepare the student to the still more general idea of Calculus on 
a differentiable manifold; the reader who wants to have a glimpse of that 
theory and of the questions to which it leads can look into the books of 
Chevalley and de Rham. Of course, he will observe in these applications, 
all the vector spaces which intervene have finite dimension; if that gives 
him an additional feeling of security, he may of course add that assump- 
tion to all the theorems of this chapter. But he will inevitably realize 
that this does not make the proofs shorter or simpler by a single line; in 
other words the hypothesis of finite dimension is entirely irrelevant to the 
material developed below; we have therefore thought it best to dispense 
with it altogether, although the applications of Calculus which deal with 
the finite-dimensional case still by far exceed the others in number and 
importance.” 

I share most of Dieudonne’s opinions expressed here. And where else will you 
read the phrase “slavish subservience to the shibboleth of numerical interpretation 
at any cost”? 


Appendix B The History of Cavalieri’s Principle 

The following is from Marsden and Weinstein’s Calculus. 

The idea behind the slice method goes back, beyond the invention of 
calculus, to Francesco Bonaventura Cavalieri (1598-1647), a student of 
Galileo and then professor at the University of Bologna. An accurate 
report of the events leading to Cavalieri’s discovery is not available, so we 
have taken the liberty of inventing one. 

Cavalieri’s delicatessen usually produced bologna in cylindrical form, so 
that the volume would be computed as n. radius 2 , length. One day the 
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casings were a bit weak, and the bologna came out with odd bulges. The 
scale was not working that day, either, so the only way to compute the 
price of the bologna was in terms of its volume. 

Cavalieri took his best knife and sliced the bologna into n very thin slices, 
each of thickness x, and measured the radii, ri, 7 * 2 , . . . , r n of the slices 
(fortunately they were all round). He then estimated the volume to be 
Y2i=i 7rr i x : the sum of the volumes of the slices. 

Cavalieri was moonlighting from his regular job as a professor at the Uni- 
versity of Bologna. That afternoon he went back to his desk and began 
the book Geometria indivisibilium continuorum nova quandum ratione 
promota (Geometry shows the continuous indivisibility between new ra- 
tions and getting promoted), in which he stated what is now known as 
Cavalieri’s principle: If two solids are sliced by a family of parallel planes 
in such a way that corresponding sections have equal areas, then the two 
solids have the same volume. 

The book was such a success that Cavalieri sold his delicatessen and re- 
tired to a life of occasional teaching and eternal glory. 


Appendix C A Short Excursion into the Com- 
plex Field 

The held C of complex numbers corresponds bijectively with R 2 . The complex number 
z = x + iy G C corresponds to (x,y) G R 2 . A function T : C -A C is complex linear 
if for all A, z, w G C we have 


T(z + w) = T(z) + T(w) and T(Xz) — XT(z). 


Since C is a one-dimensional complex vector space the value p — T(l) determines T, 
namely, T(z) — pz for all z. If z = x + iy and p — a + i/3 then pz — (ax — (3y ) + 
i(/3x + ay). In R 2 terms T : (x, y) 1 — > ((ax — fly ), (flx + ay)) which shows that T is a 
linear transformation R 2 -A R 2 whose matrix is 


a —fl 

fl a 

2 1 
1 1 


The form of this matrix is special. For instance it could never be 
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A complex function of a complex variable f(z ) has a complex derivative f(z ) 
if the complex ratio (f(z + h) — f(z))/h tends to f'(z ) as the complex number h tends 
to zero. Equivalently, 

f(z + h) — f(z) — f{z)h _ ^ Q 

h 

as h — > 0. Write f(z ) — u(x, y ) + iv(x, y ) where z = x + iy, and u, v are real-valued 
functions of two real variables. Define F : R 2 — > M 2 by F{x,y) — (u(x,y), v(x,y)). 
Then F is R-differentiable with derivative matrix 


DF 


du 

du 

dx 

dy 

dv 

dv 

dx 

dy _ 


form. This demonstrates a basic fact 


Since this derivative matrix is the R 2 expression for multiplication by the complex 

a —f3 

/3 a 

about complex differentiable functions - their real and imaginary parts, u and v. 
satisfy the 


number f'(z ), it must have the 


53 Cauchy- Riemann Equations 


du dv 
dx dy 


and 


du 

dy 


dv 

dx 


Appendix D Polar Form 

The shape of the image of a unit ball under a linear transformation T is not an issue 
that is used directly in anything we do in Chapter 5 but it certainly underlies the 
geometric outlook on linear algebra. 

Question. What shape is the {n — l)-sphere S' 71-1 ? 

Answer. Round. 

Question. What shape is T{S n ~ l )l 

Answer. Ellipsoidal. See also Exercise 39. 

Let z — x + iy be a nonzero complex number. Its polar form is z = re l ° where 
r > 0 and 0 < 8 < 2tt, and x — r cos0, y — rsind. Multiplication by z breaks up 
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into multiplication by r, which is just dilation, and multiplication by e l ° , which is 
rotation of the plane by angle 8. As a matrix the rotation is 


cos 8 — sin 8 

sin 8 cos 8 


The polar coordinates of (x,y) are (r, 8). 

Analogously, consider an isomorphism T : R n — > R n . Its polar form is 


T = OP 


where O and P are isomorphisms R n — > R n such that 

(a) O is like e^; it is an orthogonal isomorphism. 

(b) P is like r; it is positive definite symmetric (PDS) isomorphism. 

Orthogonality of O means that for all v, w G R n we have 

(Ov, Ow) — (r’, re), 

while P being PDS means that for all nonzero vectors r>, w G R n we have 


(Pv,v) > 0 and (Pv,w) — ( v,Pw ). 


The notation (v,w) indicates the usual dot product on R n . 

The polar form T — OP reveals everything geometric about T. The geometric 
effect of O is nothing. It is an isometry and changes no distances or shapes. It is 
rigid. The effect of a PDS operator P is easy to describe. In linear algebra it is shown 
that there exists a basis ® = {u \, . . . , u n } of orthonormal vectors (the vectors are of 
unit length and are mutually perpendicular) and with respect to this basis we have 



Ai 

0 


0 

A2 


0 

0 A n _i 0 

. . 0 A n 


The diagonal entries A i are positive. P stretches each U{ by the factor Thus P 
stretches the unit sphere to an n-dimensional ellipsoid. The U{ are its axes. The 
norm of P and hence of T is the largest A while the conorm is the smallest A The 
ratio of the largest to the smallest, the condition number, is the eccentricity of the 
ellipsoid. 
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Upshot Except for the harmless orthogonal factor O, an isomorphism is no more 
geometrically complicated than a diagonal matrix with positive entries. 


54 Polar Form Theorem Each isomorphism T : R n — > R n factors as T — OP, 
where O is orthogonal and P is PDS. 


Proof Recall that the transpose of T : R n — > R n is the unique isomorphism T 1 
satisfying the equation 

(Tv, w) — (v, T f w) 

for all v,w G R n . Thus the condition ( Pv,w ) — ( v,Pw ) in the definition of PDS 
means exactly that P l — P . 

Let T be a given isomorphism T : R n — > R n . We must find its factors O and 
P . We just write them down as follows. Consider the composite T l o T. It is PDS 
because 

( T t T )t = (rpt^rpty = T t T and ( T t = (TV, TV) > 0. 

Every PDS transformation has a unique PDS square root, just as does every positive 
real number r. (To see this, take the diagonal matrix with entries in place of 
A {.) Thus T l T has a PDS square root and this is the factor P that we seek, 



= T l T. 


By P 2 we mean the composite P o P. In order for the formula T — OP to hold with 
this choice of P we must have O — TP -1 . To finish the proof we merely must check 
that TP -1 actually is orthogonal. Magically, 

(Ov, Ow) = (: TP~ l v , TP~ l w) = (P _1 u, T t TP~ 1 w) 

— (P~ 1 v, Pw) — (P t P~ 1 v,w) — (PP~ 1 v,w) 

= (v,w) 

which implies that O is orthogonal. □ 


55 Corollary Under any invertible T : R n — > R n the unit ball is sent to an ellipsoid. 


Proof Write T in polar form T — OP. The image of the unit ball under P is an 
ellipsoid. The orthogonal factor O merely rotates the ellipsoid. □ 
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Appendix E Determinants 

A permutation of a set S is a bijection tv : S ^ S. That is, tv is one-to-one and onto. 
We assume the set S is finite, S — {1 The sign of tv is 

sgn(vr) = (-l) r 

where r is the number of reversals - i.e., the number of pairs z, j such that 

z < j and n (*) > 7T (j). 

56 Proposition Every permutation is the composite of pair transpositions ; the sign 
of a composite permutation is the product of the signs of its factors ; and the sign of 
a pair transposition is — 1. 

The proof of this combinatorial proposition is left to the reader. Although the 
factorization of a permutation tv into pair transpositions is not unique, the number 
of factors, say t, satisfies ( — 1)* = sgn(7r). 

Definition The determinant of a k x k matrix A is the sum 

det A — E sgn(7r)ai 7r (i)a 27r(2 ) . . . a^ ^(k) 

7 r 

where tv ranges through all permutations of {1, . . . , k}. 

Equivalent definitions appear in standard linear algebra courses. One of the key 
facts about determinants is the product rule: For two k x k matrices we have 

det AB — det A det B. 

It extends to nonsqnare matrices as follows. 

57 Cauchy-Binet Formula Assume that k < n. If A is a k x n matrix and B is 
an n x k matrix , then the determinant of the product k x k matrix AB — C is given 
by the formula 

det C — det A J det Sj, 

j 

where J ranges through the set of ascending k-tuples in {1, . . . ,n} ; A J is the k x k 
minor of A whose column indices j belong to J , while Bj is the k x k minor Of B 
whose row indices i belong to J. See Figure 132. 
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B 



Figure 132 The paired 4x4 minors of A and B are determined by the 

4-tuple J = h,k)- 


Proof Note that special cases of the Cauchy-Binet Formula occur when k — 1 or 
k — n. When k — 1, C is the lxl matrix that is the dot product of an A-yow vector 
of length n times a F>-column vector of height n. The 1-tuples J in {1, . . . , n} are just 
single integers, J — (1 J = (n), and the product formula is immediate. In the 
second case, k — n, we have the usual product determinant formula because there is 
only one ascending fc-tuple in {1, . . . , fc}, namely J — (1, . . . , k). 

To handle the general case, define the sum 

S(A,B) = E det A J det Bj 

j 

as above. Consider an elementary n x n matrix E. We claim that 

S(A,B) = S(AE, E~ 1 B). 


Since there are only two types of elementary matrices, this is not too hard a calcu- 
lation, and is left to the reader. Then we perform a sequence of elementary column 
operations on A to put it in lower triangular form 


A ' = AE\ . . . E r = 


OL\ 1 0 

0^21 C ^22 


0 

0 


&kl a k2 


&kk o 


About B r — E r 1 ... E 1 i B we observe only that 


l 


0 

0 

0 


AB = A'B' = A ,Jo B' Jo 
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where Jo = (1, . . . , k). Since elementary column operations do not affect S we have 
S(A, B) = S{AEi,E^ l B) = S(AE 1 E 2 , E^E^B) = ... = S(A B'). 

All terms in the sum that defines S(A',B r ) are zero except the Jq , and thus 

det(AB) = det A ,J ° det B[ h = S{A B ') = S{A, B) 


as claimed. 


□ 
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Exercises 

1. Let T : V -A W be a linear transformation, and let p G V be given. Prove that 
the following are equivalent. 

(a) T is continuous at the origin. 

(b) T is continuous at p. 

(c) T is continuous at at least one point of V. 

2. Let L be the vector space of continuous linear transformations from a normed 
space V to a normed space W. Show that the operator norm makes L a normed 
space. 

3. Let T : V -A W be a linear transformation between normed spaces. Show that 



<i} 

<i} 

-a 

inf{M : v G V A> \Tv\ < M|u|}. 


| Tv 


V 

| Tv 


V 

| Tv 


V 


4. The conorm of a linear transformation T : R n -A R m is 


m(T) = inf 



It is the minimum stretch that T imparts to vectors in R n . Let U be the 
unit ball in R n . 

(a) Show that the norm and conorm of T are the radii of the smallest ball 
that contains TU and the largest ball contained in TU . 

(b) Is the same true in normed spaces? 

(c) If T is an isomorphism, prove that its conorm is positive. 

(d) Is the converse to (c) true? 

(e) If T : R n — > R n has positive conorm, why is T is an isomorphism? 

(f) If the norm and conorm of T are equal, what can you say about T? 

5. Formulate and prove the fact that function composition is associative. Why 
can you infer that matrix multiplication is associative? 

6. Let M n and L n be the vector spaces ofnxn matrices and linear transformations 

Rn ]gyy 


(a) Look up the definition of “ring” in your algebra book. 

(b) Show that M n and L n are rings with respect to matrix multiplication and 
composition. 

(c) Show that T : M n — > L n is a ring isomorphism. 


7. Two norms i and 2 on a vector space are comparable^ if there are 


Wrom an analyst’s point of view, the choice between comparable norms has little importance. At 
worst it affects a few constants that turn up in estimates. 
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positive constants c and C such that for all nonzero vectors in V we have 


c < 



< c. 


(a) Prove that comparability is an equivalence relation on norms. 

(b) Prove that any two norms on a finite-dimensional vector space are com- 
parable. [Hint: Use Theorem 3. 

(c) Consider the norms 


l/l v- = [ \f(t)\dt and |/| c o = max{|/(t)| : t € [0, 1]}, 
Jo 


*8. Let 


defined on the infinite-dimensional vector space (7° of continuous func- 
tions / : [0, 1] — > R. Show that the norms are not comparable by finding 
functions / E C° whose integral norm is small but whose (7° norm is 1. 

= | |c° be the supremum norm on (7° as in the previous exercise. 


Define an integral transformation T : (7° (7° by 


T : / ^ f f(t ) dt. 
Jo 


(a) Show that T is linear, continuous, and find its norm. 

(b) Let f n (t) — cos(nt), n — 1,2, .... What is T(/ n )? 

(c) Is the set of functions K — {f n : n G N} closed? Bounded? Compact? 

(d) Is T(K) compact? How about its closure? 

9. Give an example of two 2x2 matrices such that the operator norm of the 
product is less than the product of the operator norms. 

10. In the proof of Theorem 3 we used the fact that with respect to the Euclidean 
norm, the length of a vector is at least as large as the length of any of its 
components. Show by example that this is false for some norms in R 2 . [Hint: 
Consider the matrix 



-2 

2 


Use A to define an inner product (u, w)a — X] v i a ij w j on 
product to define a norm 




and use the inner 


(What properties must A have for the sum to define an inner product? Does 
A have these properties?) With respect to this norm, what are the lengths of 
ei, e 2 , and v — e\ + e 2 ?] 
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11. Consider the shear matrix 


5 = 


1 

0 


s 

1 


and the linear transformation S : M 2 R 2 it represents. Calculate the norm 
and conorm of S. [Hint: Using polar form, it suffices to calculate the norm 
and conorm of the positive definite symmetric part of S. Recall from linear 
algebra that the eigenvalues of the square of a matrix A are the squares of the 
eigenvalues of A.] 

12. What is the one-line proof that if V is a finite-dimensional normed space then 
its unit sphere {v : \v\ — 1} is compact? 

13. The set of invertible n x n matrices is open in M. Is it dense? 

14. An n x n matrix is diagonalizable if there is a change of basis in which it 
becomes diagonal. 

(a) Is the set of diagonalizable matrices open in M (n x n)? 

(b) Closed? 

(c) Dense? 

15. Show that both partial derivatives of the function 


xy 


f(x, y) = {x l + y- 

0 


if (x, y) + (o, 0) 

if (x,y) = (0,0) 


exist at the origin but the function is not differentiable there. 

16. Let / : R 2 — >> R 3 and g : R 3 — > R be defined by / — (x, y, z) and g — w where 


w — w(x,y,z) — xy + yz + zx 

x — x(s,t) — st y — y(s,t) — scost z — z(s,t) — ssmt. 


(a) Find the matrices that represent the linear transformations ( Df) p and 
(Dg) q where p = (s 0 ,£ 0 ) = (0, 1) and q = f(p). 

(b) Use the Chain Rule to calculate the 1x2 matrix [dw/ds, dw/dt] that 
represents (D(g o f)) p . 

(c) Plug the functions x — x(s,£), y — y(s,t), and z = z{s^t) directly into 
w — w(x,y,z), and recalculate [dw/ds^ dw/dt], verifying the answer given 
in (b). 

(d) Examine the statements of the multivariable chain rules that appear in 
your old calculus book and observe that they are nothing more than the 
components of various product matrices. 

17. Let /:[/—> M m be differentiable, \p,q\ C U C M n , and ask whether the direct 
generalization of the one-dimensional Mean Value Theorem is true: Does there 
exist a point 9 G [p, q\ such that 

(28) f(q) - f(p) = (Df) e (q-p)? 
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(a) Take n = 1, m = 2, and examine the function 

f(t) — (cost, sint) 

for tt < t < 2n. Take p — tt and q — 2n. Show that there is no 9 G [p, g] 
which satisfies (28). 

(b) Assume that the set of derivatives 


{(Df) x eL(R n ,R m )-xe[p,q}} 


is convex. Prove there exists 6 G [p,q\ which satisfies (28). 

(c) How does (b) imply the one-dimensional Mean Value Theorem? 

18. The directional derivative of / : U R m at p G U in the direction u is the 
limit, if it exists, 

Vp/M = lim /(p + *"> ~ M . 

pJ K J t -> 0 t 

(Often one requires that \u\ = 1.) 

(a) If / is differentiable at p, why is it obvious that the directional derivative 
exists in each direction u! 

(b) Show that the function / : R 2 -G R defined by 


f(x,y) 


x 3 y 


x A + y 


0 


if (x,y) + (0,0) 
if (x,y) = (0,0) 


* 


has V(o ? o )f( u ) ~ 0 f° r u but is not differentiable at (0,0). 

19. Using the functions in Exercises 15 and 18, show that the composite of func- 
tions whose partial derivatives exist may fail to have partial derivatives, and 
the composite of functions whose directional derivatives exist may fail to have 
directional derivatives. (That is, the classes of these functions are not closed 
under composition, which is further reason to define multidimensional differ- 
entiability in terms of Taylor approximation, and not in terms of partial or 
directional derivatives.) 

Assume that U is a connected open subset of R n and / : U — > R m is differen- 
tiable everywhere on U. If ( Df) p — 0 for all p G U, show that / is constant. 
For U as above, assume that / is second-differentiable everywhere and ( D 2 f) p — 
0 for all p. What can you say about /? Generalize to higher-order differentia- 
bility. 

22. If Y is a metric space and / : [a, b] x Y — > R is continuous, show that 


20 . 


21 



F(y) = / f{x,y)dx 


is continuous. 
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23. Assume that / : [a, b] x Y — >► R m is continuous, V is an open subset of R n , the 
partial derivatives dfi(x , y)/dyj exist, and they are continuous. Let D y f be the 
linear transformation R n — > R m which is represented by the m x n matrix of 
partials. 

(a) Show that 

b 


is of class C 1 and 


F(y) = / f(x, y) dx 

J a 

(DF)y= f (Dyf) dX. 
J a 


24. 


This generalizes Theorem 14 to higher dimensions. 

(b) Generalize (a) to higher-order differentiability. 

Show that all second partial derivatives of the function / : R 2 — > R defined by 





if (x,y) + (0,0) 

if (x,y) = (0, 0) 


exist everywhere, but the mixed second partials are unequal at the origin, 
<9 2 /(0, 0 )/dxdy ^ d 2 f( 0, 0 )/dydx. 

*25. Construct an example of a C 1 function / : M — > R that is second-differentiable 
only at the origin. (Infer that this phenomenon occurs also in higher dimen- 
sions.) 

26. Suppose that u /3 U is a continuous function from U C M n into £(R m ,R m ). 

(a) If for all u G [/, f3 u is symmetric, prove that its average over each W C U 
is symmetric. 

(b) Conversely, prove that if the average over all small two-dimensional paral- 
lelograms in U is symmetric then / 3 U is symmetric for all u G U . (That is, 
if for some p G f7, (3 P is not symmetric, prove that its average over some 
small two-dimensional parallelogram at p is also not symmetric.) 

(c) Generalize (a) and (b) by replacing L with a finite-dimensional space E , 
and the subset of symmetric bilinear maps with a linear subspace of E\ 
The average values of a continuous function always he in the subspace if 
and only if the values do. 

*27. Assume that / : U — > R m is of class C 2 and show that D 2 f is symmetric by 
the following integral method. With reference to the signed sum A of / at the 
vertices of the parallelogram P in Figure 109, use the C 1 Mean Value Theorem 
to show that 

A = (y J (D 2 f) p+sv+tw dsdtj(v,w). 

Infer symmetry of ( D 2 f) p from symmetry of A and Exercise 26. 
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28. Let /3 : R n x • • • x R n — > R m be r-linear. Define its “symmetrization” as 


symm(/3)(ui, . . . ,u r ) 



where n ranges through the set of permutations of {1, . . . , r}. 

(a) Prove that symm (/?) is symmetric. 

(b) If /? is symmetric prove that symm (/3) = /3. 

(c) Is the converse to (b) true? 

(d) Prove that a — f3 — symm (/?) is antisymmetric in the sense that if 7r is any 
permutation of {1, . . . , r} then 

^(Ur(i)? • • • 5 Ur(r)) sgn( 7 r)Q'(rq , • • • 5 ^v)* 

Infer that £ r = ££ © L r a where L r s and L r a are the subspaces of symmetric 
and antisymmetric r-linear transformations. 

(e) Let (3 E £ 2 (R 2 ,R) be dehned by 

/3{{x,y), (x',y')) = xy’ . 

Express f3 as the sum of a symmetric and an antisymmetric bilinear trans- 
formation. 

*29. Prove Corollary 18 that r th -order differentiability implies symmetry of D r /, 
r > 3, in one of two ways. 

(a) Use induction to show that (D r f) p (v i, . . . ,ry) is symmetric with respect 
to permutations of vi , . . . , v r -\ and of ^ 2 , . . . , v r . Then take advantage of 
the fact that r is strictly greater than 2. 

(b) Define the signed sum A of / at the vertices of the paralleletope P spanned 
by vi , . . . , ry, and show that it is the average of D r f . Then proceed as in 
Exercise 27. 

30. Consider the equation 


(29) xe y + ye x — 0. 

(a) Observe that there is no way to write down an explicit solution y — y{x) 
of (29) in a neighborhood of the point (xo,j/o) = (O^O)- 

(b) Why, nevertheless, does there exist a C°° solution y — y(x) of (29) near 

( 0 , 0 )? 

(c) What is its derivative at x — 0? 

(d) What is its second derivative at x — 0? 

(e) What does this tell you about the graph of the solution? 

(f) Do you see the point of the Implicit Function Theorem better? 

**31. Consider a function / : U — )> R such that 
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(i) U is a connected open subset of R 2 . 

(ii) / is C 1 . 

(iii) For each (x,y) G U we have 

9f(x, y) 
dy 

(a) If U is a disc show that / is independent of y. 

(b) Construct such an / of class C°° which does depend on y. 

(c) Show that the / in (b) can not be analytic. 

(d) Why does your example in (b) not invalidate the proof of the Rank The- 
orem on page 306? 

32. Let G denote the set of invertible n x n matrices. 

(a) Prove that G is an open subset of M(n x n). 

(b) Prove that G is a group. (It is called the general linear group.) 

(c) Prove that the inversion operator Inv : A i— > A ~ 1 is a homeomorphism of 
G onto G. 

(d) Prove that Inv is a diffeomorphism and show that its derivative at A is 
the linear transformation JVC -0- JVC, 

X >->■ - VoIoV. 

(e) Relate this formula to the ordinary derivative of 1/x at x = a. 

33. Observe that Y — InvX solves the implicit function problem 


F(X,Y)-I = 0, 


where F(X,Y) — X o Y. Assume it is known that Inv is smooth and use the 
Chain Rule to derive from this equation the formula for the derivative of Inv. 

34. Use Gaussian elimination to prove that the entries of the matrix A -1 depend 
smoothly (in fact analytically) on the entries of A. 

*35. Give a proof that the inversion operator Inv is analytic (i.e., is defined locally 
by a convergent power series) as follows: 

(a) If T G £(R n ,R n ) and ||T|| < 1 show that the series of linear transforma- 
tions 

I + T + T 2 + ... + T k + ... 


converges to a linear transformation S', and 


S o (/ - T) = / = (/- T) o S, 


where I is the identity transformation. 

(b) Infer from (a) that inversion is analytic at /. 
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In general, if To £ G and ||T|| < 1/ 1 1 XI 


1 


o 


show that 


Inv(T 0 - T) = In v(J - T 0 _1 o T) o T" 1 , 


and infer that Inv is analytic at Tq. 

(d) Infer from the general fact that analyticity implies smoothness that inver- 
sion is smooth. 

(Note that this proof avoids Cramer’s Rule and makes no use of Unite-dimensionality.) 
*36. Give a proof of smoothness of Inv by the following bootstrap method. 

(a) Using the identity 

X - 1 - y- 1 = x _1 o (Y - X) o Y - 1 


give a simple proof that Inv is continuous. 

(b) Infer that Y = Inv(X) is a continuous solution of the C°° implicit function 
problem 

F(X,Y)-I = 0, 

where F(X,Y) = X o Y as in Exercise 33. Since the proof of the C 1 
Implicit Function Theorem relies only continuity of Inv, it is not circular 
reasoning to conclude that Inv is C 1 . 

(c) Assume simultaneously that the C r Implicit Function Theorem has been 
proved and that Inv is known to be C r ~ 1 . Prove that Inv is C r and that 
the C r+1 Implicit Function Theorem is true. 

(d) Conclude logically that Inv is smooth and the C°° Implicit Function The- 
orem is true. 

Note that this proof avoids Cramer’s Rule and makes no use of finite dimen- 
sionality. 

*37. Use polar decomposition to give an alternate proof of the volume-multiplier 
formula. 

**38. Consider the set S of all 2 x 2 matrices IgM that have rank 1. 

(a) Show that in a neighborhood of the matrix 



0 

0 



S is diffeomorphic to a two-dimensional disc. 

(b) Is this true (locally) for all matrices X G S'? 

(c) Describe S globally. (How many connected components does it have? Is 
it closed in M? If not, what are its limit points and how does S approach 
them? What is the intersection of S with the unit sphere in M?, etc.) 

Draw pictures of all the possible shapes of T(S 2 ) where T : R 3 — >> R 3 is a linear 
transformation and S 2 is the 2-sphere. (Don’t forget the cases in which T has 
rank < 3.) 
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40. Let 0 < e < 1 and a, b > 0 be given, 

(a) Prove that 


a 


1 + e 


< j<d+ £ ) 


a — b < 16 eb. 


(b) Is the estimate in (a) sharp? (That is, can 16 be replaced by a smaller 
constant?) 

**41. Suppose that / and g are r th -order differentiable and that the composite h — 
g o / makes sense. A partition divides a set into nonempty disjoint subsets. 
Prove the Higher Order Chain Rule, 


(■ D r h) p = Y, E (^)<? ° (D»f) p 

k = 1 /iGP(fe,r) 


where g partitions {1, . . . , r} into k subsets, and q = f(p). In terms of r-linear 
transformations, this notation means 


(D r h) p (v i,...,v r ) 

= EE (D k 9) q ((D M f) P M, ... ,(D^f) p (v, k )) 

k= 1 ii 

where \gi\ — and is the |-tuple of vectors Vj with j G /^. (Symmetry 
implies that the order of the vectors Vj in the |^|-tuple and the order in 
which the partition blocks fi \ occur are irrelevant.) 

**42. Suppose that /? is bilinear and /3(f,g) makes sense. If / and g are r th -order 
differentiable at p, find the Higher-Order Leibniz Formula for D r (f3(f , g)) p - 
[Hint: First derive the formula in dimension 1. 

43. Suppose that T : R n — > R m has rank k. 

(a) Show there exists a S > 0 such that if S : R n R m and \\S — T\\ < S then 
S has rank > k. 

(b) Give a specific example in which the rank of S can be greater than the 
rank of T, no matter how small 5 is. 

(c) Give examples of linear transformations of rank k for each k where 0 < 
k < min{n, m}. 

44. Let S C M be given. 

(a) Define the characteristic function Xs : A? — > R. 

(b) If M is a metric space, show that Xs( x ) is discontinuous at x if and only 
if x is a boundary point of S. 

45. On page 315 there is a definition of Z C R 2 being a zero set that involves open 
rectangles. 
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*46. 


(a) Show that the definition is unaffected if we require that the rectangles 
covering Z are open squares. 

(b) What if we permit the squares or rectangles to be nonopen? 

(c) What if we use discs or other shapes instead of squares and rectangles? 
Assume that S C R 2 is bounded. 

(a) Prove that if S is Riemann measurable then so are its interior and closure. 

(b) Suppose that the interior and closure of S are Riemann measurable and 


*47. 


| int(iS)| = \S\ < oo. Prove that S is Riemann measurable. 

(c) Show that some open bounded subsets of R 2 are not Riemann measurable. 
See Appendix E in Chapter 6. 

In the derivation of Fubini’s Theorem on page 316, it is observed that for all 
y G [c, d\ \ Y, where Y is a zero set, the lower and upper integrals with respect 
to x agree, F_(y) — F(y). One might think that the values of F and F on Y 
have no effect on their integrals. Not so. Consider the function defined on the 
unit square [0, 1] x [0, 1], 


1 


f(x,y) = < 


1 - 1/q 


if y is irrational 

if y is rational and x is irrational 
if y is rational and x — p/q is rational 
and written in lowest terms. 


(a) Show that / is Riemann integrable and its integral is 1. 

(b) Observe that if Y is the zero set Q n [0, 1] then for each y ^Y. 


l 


f(x,y)dx 


0 


exists and equals 1. 

(c) Observe that if for each y <EY we choose in a completely arbitrary manner 
some 

h(y) G [F(y),F(y)} 

and set 

' F(y) = F(y) if y 0 Y 
h(y) if y G Y 

then the integral of H exists and equals 1, but if we take g(x) — 0 for all y G Y 
then the integral of 


H{x) = 


G(x) — 


Elv) = F(y) 
g(y ) = o 


if y?Y 

if y G Y 


does not exist. 

***48. Is there a criterion to decide which redefinitions of the Riemann integral on the 
zero set Y of Exercise 47 are harmless and which are not? 
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49. Using the Fundamental Theorem of Calculus, give a direct proof of Green’s 
Formulas 

f y dxdy — / f dx and g x dxdy — g dy 

R JdR j JR JdR 

where R is a square in the plane and /, g : R 2 -A R are smooth. (Assume that 
the edges of the square are parallel to the coordinate axes.) 

50. Draw a staircase curve S n that approximates the diagonal 

A — {(x, y) G R 2 : 0 < x = y < 1} 

to within a tolerance 1/n. See Figure 133. Suppose that /, g : R 2 


R are 



Figure 133 The staircase curve approximating the diagonal consists of 

both treads and risers. 


smooth. 

(a) Why does the length of S n not converge to the length of A as n -A oo? 

(b) Despite (a), prove that 


/ dx 


f dx and 


gdy 


gdy 


s. 


n 


A 


S. 


n 


A 


as n -A oo. 

(c) Repeat (b) with A replaced by the graph of a smooth function h : [a, b } 

R. 


(d) If C is a smooth simple closed curve in the plane, show that it is the union 
of finitely many arcs CV, each of which is the graph of a smooth function 
y — h(x) or x — h(y ), and the arcs Cg meet only at common endpoints. 
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(e) Infer that if (S n ) is a sequence of staircase curves that converges to C then 


/ dx + g dy 


f dx + g dy. 


s. 


n 


c 


51. 


(f) Use (e) and Exercise 49 to give a proof of Green’s Formulas on a general 
region D C R 2 bounded by a smooth simple closed curve C, that relies 
on approximating^ (7, say from the inside, by staircase curves S n which 
bound regions R n composed of many small squares. (You may imagine 
that R\ C i?2 C . . . and that R n -A D .) 

A region R in the plane is of type 1 if there are smooth functions g\ : [a, b] -A R, 
g 2 : [a, b] -A R such that g \{x) < g 2 (x) and 


r> r ( 


R is of type 2 if the roles of x and y can be reversed, and it is a simple region 
if it is of both type 1 and type 2. 

(a) Give an example of a region that is type 1 but not type 2. 

(b) Give an example of a region that is neither type 1 nor type 2. 

(c) Is every simple region starlike? Convex? 

(d) if a convex region is bounded by a smooth simple closed curve, is it simple? 

(e) Give an example of a region that divides into three simple subregions but 
not into two. 

*(f) If a region is bounded by a smooth simple closed curve C then it need not 
divide into a finite number of simple subregions. Find an example. 

(g) Infer that the standard proof of Green’s Formulas for simple regions (as, 
for example, in J. Stewart’s Calculus ) does not immediately carry over to 
the general planar region R with smooth boundary; i.e., cutting R into 
simple regions can fail. 

***(h) Is there a planar region bounded by a smooth simple closed curve such 
that for every linear coordinate system (i.e., a new pair of axes), the region 
does not divide into finitely many simple subregions? In other words, is 
Stewart’s proof of Green’s Theorem doomed? 

*(i) Show that if the curve C in (f) is analytic, then no such example exists. 
[Hint: C is analytic if it is locally the graph of a function defined by a 
convergent power series. A nonconstant analytic function has the property 
that for each x, there is some derivative of / which is nonzero, f( r A) ^ 0.] 
**52. Show that every starlike open subset of the plane is diffeomorphic to the plane. 
(The same is true in R n .) 

Mhis staircase approximation proof generalizes to regions that are bounded by fractal, nondiffer- 
entiable curves such as the von Koch snowflake. As Jenny Harrison has shown, it also generalizes to 
higher dimensions. 
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**53. The 2-cell <p : I n -A B n constructed in Step 3 of the proof of Brouwer’s Theorem 
is smooth but not one-to-one. For it crushes the corners of I n into dB. 

(a) Construct a homeomorphism h : I 2 -A B 2 where I 2 is the closed unit 
square and B 2 is the closed unit disc. 

(b) In addition make h in (a) be of class C 1 (on the closed square) and be 
a diffeomorphism from the interior of 1 2 onto the interior of B 2 . (The 
derivative of a diffeomorphism is everywhere nonsingular.) 

(c) Why can h not be a diffeomorphism from 1 2 onto F> 2 ? 

(d) Improve class C 1 in (b) to class C °° . 

**54. If K, L C R n and if there is a homeomorphism h : K -A L that extends to 
H : U -A V such that U,V C R n are open, H is a homeomorphism, and 77, 77 _1 
are of class C r with 1 < r < oo then we say that K and L are ambiently C r - 
diffeomorphic. 

(a) In the plane, prove that the closed unit square is ambiently diffeomorphic 
to a general rectangle and to a general parallelogram. 

(b) If K, L are ambiently diffeomorphic polygons in the plane, prove that K 
and L have the same number of vertices. (Do not count vertices at which 
the interior angle is 180 degrees.) 

(c) Prove that the closed square and closed disc are not ambiently diffeomor- 
phic. 

(d) If K is a convex polygon that is ambiently diffeomorphic to a polygon L, 
prove that L is convex. 

(e) Is the converse to (b) true or false? What about in the convex case? 

(f) The closed disc is tiled by five ambiently diffeomorphic copies of the unit 
square as shown in Figure 134. Prove that it cannot be tiled by fewer. 

(g) Generalize to dimension n > 3 and show that the n-ball can be tiled by 
2n + 1 diffeomorphs of the n-cube. Can it be done with fewer? 

(h) Show that a triangle can be tiled by three diffeomorphs of the square. 
Infer that any surface that can be tiled by diffeomorphs of the triangle 
can also be tiled by diffeomorphs of the square. What happens in higher 
dimensions? 

55. Choose at random /, J, two triples of integers between 1 and 9. Check that 
dxj A dxj = dxjj. 

56. True or false? For every fc-form a we have a A a = 0. 

57. Show that d : -A is a linear vector space homomorphism. 

58. Using Stokes’ Formula (but not the Poincare Lemma and its consequences), 
prove that closed 1-forms are exact (i.e. , duj — 0 uj — dh for some h ) when 
defined on R 2 or on any convex open subset of R 2 as follows. 

(a) If (/?, ^ : [0, 1 \ —> U are paths from p to g, define 


cr(s,£) = (1 - s)p(t) + 
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Figure 134 Five diffeomorphs of the square tile the disc. 


for 0 < Sjt < 1 and observe it is a smooth 2-cell. 

(b) If cj = / dx + g dy is a closed 1-form, how does Stokes’ Formula imply 

f^cj — and what does this mean about path independence? 

(c) Show that if p is held fixed then 



(jj 


is smooth and dh — uj. 

(d) What if U is nonconvex but diffeomorphic to R 2 ? 

(e) What about higher dimensions? 

*59. For 0 < a < b the spherical shell is the set 


U — {(x, t/, x) G R 3 : a 2 < x 2 + y 2 + z 2 < b 2 }. 

It is the open region between spheres of radius a and b. If C is any closed 
curve in U (i.e., the image of a continuous map 7 : S 1 [/), show that C 
can be shrunk to a point without leaving U. That is, U is simply connected. 
[Hint: Why is there a point of U not in C, and how does this help? Gazing at 
Figure 135 may be a good idea.] 

*60. Prove that the closure of the spherical shell is simply connected. 

61. True or false? If uj is a fc-form and k is odd, then u A u — 0. What if k is even 
and > 2? 
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62. Does there exist a continuous mapping from the circle to itself that has no 
fixed-point? What about the 2-torus? The 2-sphere? 

63. Show that a smooth map T : U -A V induces a linear map of cohomology 
groups H k (V ) -A H k (JJ ) defined by 


T 


* 


uj 


i y [T*ca 


64. 

65. 


Here, [uj] denotes the equivalence class of uj G Z k (V) in H k (V). The question 
amounts to showing that the pullback of a closed form uj is closed and that its 
cohomology class depends only on the cohomology class of ujJ 
Prove that diffeomorphic open sets have isomorphic cohomology groups. 

Show that the 1-form defined on R 2 \ {(0, 0)} by 


— y . x 
uj — -z- dx H — k dy 

is closed but not exact. Why do you think that this 1-form is often referred to 
as dd and why is the name problematic? 

66. Let H C M 3 be the helicoid {(x, y, z) : x 2 + y 2 ^ 0 and z = arctany/x} and let 
tv : H -A R 2 \ {(0, 0)} be the projection (x, y, z) i-A (x, y). 

(a) For uj — (xdy — ydx)/r 2 as in Exercise 65, why is i t*uj a closed 1-form on 
HI 

(b) Is it exact? That is, does there exist a smooth function / : H -A R such 
that df — uj 7 

(c) Is there more than one? 

(d) Is there more than one such that /( 1, 0, 0) = 0? 

67. Show that the 2-form defined on the spherical shell by 


x 


y 


z 


uj — dy A dz + dz A dx + dx A dy 


r 


r 


r 


is closed but not exact. 

68. True or false: If uj is closed then fuj is closed. 

True or false: If uj is exact then fuj is exact. 

69. Is the wedge product of closed forms closed? Of exact forms exact? What 
about the product of a closed form and an exact form? Does this give a ring 
structure to the cohomology classes? 


M fancier way to present the proof of the Brouwer Fixed Point Theorem goes like this: As 
always, the question reduces to showing that there is no smooth retraction T of the n-ball to its 
boundary. Such a T would give a cohomology map T* : H k (dB ) — >• H k (B) where the cohomology 
groups of dB are those of its spherical shell neighborhood. The map T* is seen to be a cohomology 
group isomorphism because T o inclusion's = inclusion's and inclusion^ = identity. But when 
k = n — 1 > 1 the cohomology groups are nonisomorphic; they are computed to be i7 n-1 (<9F>) = M 
and H n -\B) = 0. 
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70. Prove that the n-cell : [—1, l] n — > B n in the proof of the Brouwer Fixed-Point 
Theorem has Jacobian //(r)p(r) n_1 / r n_1 for r — |u| as claimed on page 355. 

**71. The Hairy Ball Theorem states that any continuous vector held X in R 3 
that is everywhere tangent to the 2-sphere S is zero at some point of S. Here 
is an outline of a proof for you to fill in. (If you imagine the vector held as hair 
combed on a sphere, there must be a cowlick somewhere.) 

(a) Show that the Hairy Ball Theorem is equivalent to a hxed-point assertion: 
Every continuous map of S to itself that is sufficiently close to the iden- 
tity map S — > S has a hxed-point. (This is not needed below but it is 
interesting.) 

(b) If a continuous vector held on S has no zero on or inside a small simple 
closed curve C C S', show that the net angular turning of X along C 
as judged by an observer who takes a tour of C in the counterclockwise 
direction is —27 r. (The observer walks along C in the counterclockwise 
direction when S is viewed from the outside, and he measures the angle 
that X makes with respect to his own tangent vector as he walks along 
C. By convention, clockwise angular variation is negative.) Show also 
that the net turning is +27T if the observer walks along C in the clockwise 
direction. 

(c) If Ct is a continuous family of simple closed curves on S', a < t < 6, and 
if X never equals zero at points of Ct, show that the net angular turning 
of X along Ct is independent of t. (This is a case of a previous exercise 
stating that a continuous integer- valued function of t is constant.) 

(d) Imagine the following continuous family of simple closed curves Ct- For 
t = 0, Co is the Arctic Circle. For 0 < t < 1/2, the latitude of Ct 
decreases while its circumference increases as it oozes downward, becomes 
the Equator, and then grows smaller until it becomes the Antarctic Circle 
when t — 1/2. For 1/2 + t + 1, Ct maintains its size and shape, but its new 
center, the South Pole, slides up the Greenwich Meridian until at t — 1, 
Ct regains its original arctic position. See Figure 135. Its orientation has 
reversed. Orient the Arctic Circle Co positively and choose an orientation 
on each Ct that depends continuously on t. To reach a contradiction, 
suppose that X has no zero on S. 

(i) Why is the total angular turning of X along Co equal to —2n7. 

(ii) Why is it +27 r on Ci? 

(iii) Why is this a contradiction to (c) unless X has a zero somewhere? 

(iv) Conclude that you have proved the Hairy Ball Theorem. 
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Figure 135 A deformation of the Arctic Circle that reverses its orientation. 


6 

Lebesgue Theory 


This chapter presents a geometric theory of Lebesgue measure and integration. In 
calculus you certainly learned that the integral is the area under the curve. With 
a good definition of area that is the point of view I advance here. Deriving the 
basic theory of Lebesgue integration then becomes a matter of inspecting the right 
picture. See Appendix E for the geometric relation between Riemann integration and 
Lebesgue integration. 

Throughout the chapter definitions and theorems are stated in R n but proved in 
R 2 . Multidimensionality can complicate a proof’s notation but never its logic. 


1 Outer Measure 

How should you measure the length of a subset of the line? If the set to be measured 
is simple, so is the answer. The length of the interval (a, b) is b — a. But what is 
the length of the set of rational numbers? of the Cantor set? As is often the case in 
analysis we proceed by inequalities and limits. In fact one might distinguish the fields 
of algebra and analysis solely according to their use of equalities versus inequalities. 


Definition The length of an interval I — (a, b) is b — a. 
Lebesgue outer measure of a set A C R is 


It is denoted I . The 



inf | A | : {4} is a covering of A by open intervals j> . 
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Tacitly we assume that the covering is countable; the series |/&| is its total 
length. (Recall that “countable” means either finite or denumerable.) The outer 
measure of A is the infimum of the total lengths of all possible coverings {/&} of A 
by open intervals. If every series E k \h I diverges then by definition m*A — oo. 


Outer measure is defined for every 4cR. It measures A from the outside as do 
calipers. A dual approach measures A from the inside. It is called inner measure, 
is denoted m^A, and is discussed in Section 4. 

Three properties of outer measure (the “axioms of outer measure”) are easy to 
check. 


1 Theorem (a) The outer measure of the empty set is 0, m*0 = 0. 

(b) If Ac. B then m*A < m*B. 

(X) 

(c) If A — U^Li An then m*A < X rn*A n . 

71=1 

Proof (b) and (c) are called monotonicity and countable subadditivity. 

(a) This is obvious. Every interval covers the empty set. 

(b) This is obvious. Every covering of B is also a covering of A. 

(c) This uses the e/2 n trick. Given e > 0 there exists for each n a covering {Ik,n : 
k G N} of A n such that 


(X) 


/y I U,n 


< m* A n + 


>77 


k= 1 


The collection {Ik,n : G N} covers A and 


(X) (X) 


(X) 


(X) 


I Ik,n 

k : n 



Ik,n\ < + — ) = ^ ^m*A n + e. 


71=1 k= 1 


77=1 


77=1 


Thus the inhmum of the total lengths of coverings of A by open intervals is < 
^ n m*A n + e, and since e > 0 is arbitrary the inhmum is < which is 

what (c) asserts. □ 


Next, suppose you have a set A in the plane and you want to measure its area. 
Here is the natural way to do it. 


Definition The area of a rectangle R — (a, b ) x (c, d) is \R\ = (b — a) • (d — c) and 
the (planar) outer measure of A C R 2 is the inhmum of the total area of countable 


Section 1 


Outer Measure 


385 


coverings of A by open rectangles R & 


See Figure 136. 


m* A = inf j ^ \R k \ : {R k } 

l k 


covers A 



Figure 136 Rectangles that cover A 

Because it is so natural, the preceding definition makes perfect sense in higher 
dimensions too. 

Definition An open box B C R n is the Cartesian product n open intervals, B — 
\ k Ik- Its n-dimensional volume \B\ is the product of their lengths. The n-dimensional 
outer measure of A C R n is the infimum of the total volume of countable coverings 
of A by open boxes B & 


m*A = inf \^k\ : {^k} covers A j> 


If need be, we decorate | | and m*with subscripts C T”, u 2”, or to distinguish 

the linear, planar, and n-dimensional quantities. As in the linear case we write \R\ and 
B | only for open rectangles and boxes. The outer measure axioms - monotonicity, 
countable subadditivity, and the outer measure of the empty set being zero - are true 
for planar outer measure too. See also Exercise 2. 
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Definition If Z C R n has outer measure zero then it is a zero set. 

2 Proposition Every subset of a zero set is a zero set. The countable union of zero 
sets is a zero set. Each plane Pi (a) = {(xi, . . . , x n ) G M n : X{ — a} is a zero set in 
R n . 

Proof Monotonicity implies m\ZR < m*Z — 0 whenever Z r is a subset of a zero 
set Z . If m*(Z k ) — 0 for all k G N and Z — (J Z k then by Theorem 1(c) we have 

m*Z < m*Z k = 0. 

k 

We assume n — 2. The “plane” Pi (a) is the line {x — a} when i — 1 or {y — a} when 
i — 2. Given e > 0 we can cover the line Pi (a) with rectangles R k — Ik x J \ where 

I k = (a - e/k2 k+2 , a + e/k2 k+ 2 ) J k = (-fc, fc). 

The total area of these rectangles is e so Pi (a) is a zero set. □ 


The next theorem states a property of outer measure that seems obvious. 

3 Theorem The linear outer measure of a closed interval is its length; the planar 
outer measure of a closed rectangle is its area ; the n-dimensional outer measure of a 
closed box is its volume. 


Inductive Proof for the Closed Interval [a, b] For each e > 0 the open interval 
(a — e, b + e) covers [a, bj. Thus m*([a, b}) < (b + e) — (a — e) = b — a + 2e. By the 
e-principle we get m*([a, b]) <b — a. 

To get the reverse inequality we must show that if {R} is a countable open covering 
of [a, b] then ^ \R\ > b — a. Since [a, b] is compact it suffices to prove this for finite 
open coverings {Pi , . . . , I n }. Let R = (cq, bR. We reason inductively. If n = 1 then 
(ai, b\) D [a, b] implies ai < a < b < b\ so b — a < \R\. That’s the base case of the 
induction. 

Assume that for each open covering of a compact interval [c, d\ by n open intervals 

{J,} we have d — c /* R ^ and let {A} be a covering of [cq 6] by n 1 open 

intervals R — (cq, bR. We claim that Y17=i 1^1 > b — a. One of the intervals contains 
a, say it is R — (ai, bR. If b\ > b then R D [a, b) and again a\ < a <b <b\ implies 
that 1^1 — I A | — b\ — a\ > b — a. On the other hand, if b\ < b then 



Mi) U [6i,6] 
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and |/i| > b\—a. The compact interval [&i, b] is covered by I2 
we have Y ^=2 1^1 > b — b\. Thus 


. , I n + 1. By induction 


n+ 1 

Ei* 


i=l 


/ 


1 


n+1 

+ E 

i= 2 


IT 


> (61 — a) + (6 — bi) — b — a 


which completes the induction and the proof. 


□ 


The preceding inductive proof does not carry over to rectangles. For a rectangle 
has no left to right order. However, the following grid proof works for intervals, 
rectangles, and boxes. 


Grid proof for a closed rectangle Let R — [a, b] x [c, d\ C K 2 . It is simple to see 
that m*R < (6 — a) • (d — c) . To check the reverse inequality consider any countable 
covering of R by open rectangles Ri. We must show that J 2 \Ri\ > (b — a) • [d — c). 
Since R is compact the covering has a positive Lebesgue number A. Take a grid of 
open rectangles Sj C R of diameter < A such that \ Sj\ — (b — a) • [d — c). See 
Figure 137 . Then 



“i 


j 


1 


j 


1 


j 


n 


j 


Figure 137 The rectangles Si, . . . , S4 are contained in R\. The rectangles 
S3, . . . Ss are contained in R2. The rectangles S3 and S4 are contained in 
both R\ and R2 so their area will be counted twice in E E i^i- 


SjCRi 


Ei^i s E E iSfi £ Ei* 


J 


i SjCRi 


implies {b — a) • {d — c) < ^ \Ri\- Thus (6 — a) • (d — c) — m*R as claimed 


□ 
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4 Corollary The formulas m*I — b — a, m*R — (b — a) • (d — c), and m*B = 


* 


*■ 


J k m*(Ik) hold also for intervals, rectangles, and boxes that are open or partly open. 


In particular, m*I — \I\, m*R — \R\, and rri*B = B\ for open intervals, rectangles 
and boxes. 


* 


* 


Proof Let I be any interval with endpoints a < b and let e > 0 be given. (We assume 
e < (b — a)/ 2 without loss of generality.) The closed intervals J — [a + e, b — e] and 
J r — [a — e, b + e] sandwich I as J C I C J r . By Theorem 3 we have ra*J — b — a — 2e 
and m*J f = b — a + 2e. Thus 


^ 7 

m J 


< 


^ T 

m 1 


< 


^ 7/ 

rn J 


b — a — 2e < 


I 


b — a 2e. 


Then Ira */ - 1/| < 4e for all e > 0 which implies ra*7 = |/|. The sandwich method 
works equally well for rectangles and boxes. □ 


*■ 


2 Measurability 

If A and B are subsets of disjoint intervals in R it is easy to show that 

m*(A U B) — m*A + m*B. 

But what if A and B are merely disjoint? Is the formula still true? The answer 
is “yes” if the sets have an additional property called measurability, and “no” in 
general as is shown in Appendix D. Measurability is the rule and nonmeasurability 
the exception. The sets you meet in analysis - open sets, closed sets, their unions, 
differences, etc. - all are measurable. See Section 4. 


Definition A set E C R is (Lebesgue) measurable if the division E\E C of R is so 

“clean” that for each “test set” X C R we have 


(1) m*X = m*(XnE) + m*(XnE c ). 

The definition of measurability in higher dimensions is analogous. A set E C R n is 
measurable if E\E C divides each X C R n so cleanly that (1) is true for n-dimensional 
outer measure. 
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We denote by JVC = JVC(R n ) the collection of all Lebesgue measurable subsets of 
R n . If E is measurable its Lebesgue measure is ra*E, which we write as mE , 
dropping the asterisk to emphasize the measurability of E. 


Which sets are measurable? It is obvious that the empty set is measurable. It 
is also obvious that if a set is measurable then so is its complement, since E\E C and 
E C \E divide a test set X in the same way. 


In the rest of this section we analyze measurability in the abstract. For the basic 
facts about measurability have nothing to do with R or R n . They hold for any 
“abstract outer measure.” 


Definition Let M be any set. The collection of all subsets of M is denoted as 2 M . 
An abstract outer measure on M is a function u : 2 M — >► [0, oo] that satisfies 
the three axioms of outer measure: c j(0) = 0, uj is monotone, and uj is countably 
subadditive. A set E C M is measurable with respect to uj if E\E C is so clean that 
for each test set X C M we have 


uX = uj(x nL) + uj(x n E c ). 

Example Given any set M there are two trivial outer measures on M. Counting 
outer measure assigns to a finite set S C M its cardinality and assigns oo to every 
infinite set. The zero/infinity measure assigns outer measure zero to the empty set and 
oo to every other set. All sets are measurable with respect to these outer measures. 
See Exercise 10. 


Example A less trivial outer measure weights Lebesgue outer measure. One sets 
2 . 

ujI — e~ c | / , where c is the midpoint of the interval /, and then defines the outer 
measure of A C R to be the infimum of the total uj- area of countable interval coverings 
of A. Other weighting functions can be used. 


5 Theorem The collection M of measurable sets with respect to any outer measure 
on any set M is a a-algebra and the outer measure restricted to this cr-algebra is 
countably additive. All zero sets are measurable and have no effect on measurability. 
In particular Lebesgue measure has these properties. 


A cr-algebra is a collection of sets that includes the empty set, is closed under 
complement, and is closed under countable union. Countable additivity of u means 
that if E u E 2 ,... are measurable with respect to uj then 

E = U Ei => ujE — ojEj. 
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Proof Let JVC denote the collection of measurable sets with respect to the outer mea- 
sure bo on M. First we deal with zero sets, sets for which uoZ — 0. By monotonicity, 
if Z is a zero set and X is a test set then 


cjX < oj(xnz)+oj(xnz c ) = o + ^(mz c ) < cjX 

implies Z is measurable. Likewise, if E\E C divides X cleanly then so do (EU Z)\(EU 
Z) c and (E \ Z)\(E \ Z) c . That is, Z has no effect on measurability. 

To check that M is a cr- algebra we must show that it contains the empty set, is 
closed under complements, and is closed under countable union. By the definition of 
outer measure the empty set is a zero set so it is measurable, 0 G M. Also, since E\E C 
divides a test set X in the same way that E C \E does, JVC is closed under complements. 
To check that JVC is closed under countable union takes four preliminary steps: 


(a) JVC is closed under differences. 

(b) JVC is closed under finite union. 

(c) u is finitely additive on JVC. 

(d) uj satisfies a special countable addition formula. 

(a) For measurable sets Fd,Fd, and a test set A, draw the Venn diagram in 
Figure 138 where X is represented as a disc. To check measurability of E\ \ E<i we 
must verify the equation 

2 + 134 = 1234 

where 2 = uj[X n (E\ \ F^)), 134 = uo(X n (F7i \ F72)) c , 1234 = cjX, etc. Since E\ 
divides any set cleanly, 134 = 1 + 34, and since E 2 divides any set cleanly, 34 = 3 + 4. 
Thus 

2 + 134 = 2 + 1 + 3 + 4 = 1 + 2 + 3 + 4. 

For the same reason 1234 = 12 + 34 = 1 + 2 + 3 + 4 which completes the proof of 
(a). 

(b) Suppose that Fd, E 2 are measurable and E — E\ U FV Since E c — E\ \ Fd, 
(a) implies that FT G JVC and thus E G JVC. For more than two sets, induction shows 
that if E \, . . . , E n G JVC then E\ U . . . U E n G JVC. 

(c) If Ei, E 2 Gl are disjoint then E\ divides E — E\ U F^ cleanly, so 

(jjE — (jj(E n Ei) + u(EnEi) — ujE\ + ccF/2 5 

which is additivity for pairs of measurable sets. For more than two measurable sets, 
induction implies that uj is finitely additive on M; i.e., if E \, . . . , E n G 3Vt then 

n 71 

E — |J E{ +> (jjE — ujE t . 
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Figure 138 The picture that proves M is closed under differences. 

(d) Given a test set X C M and a countable disjoint union of measurable sets 
E — U E{ of measurable sets we claim that 

(2) u(x n E) = n Ei). 

i 

(When X — M this is countable additivity, but in general X need not be measurable.) 
Consider the division 


in(£iu e 2 ) = {x n Ei) u(in e 2 ). 

Measurability of E\ implies that the two outer measures add. By induction the same 
is true for any finite sum, 

uj(x n {Ei u . . . u E k )) = uj(x n Ei) + . . . + uj(x n E k ). 

Monotonicity of uj implies that 


u(XnE) > u{Xn{E 1 U...UE k )), 


392 


Lebesgue Theory 


Chapter 6 


and so uj(X n E ) dominates each partial sum of the series ^uj(X n Ei). Hence it 
dominates the series too, 


(X) 

^^uj(xnEi) < uj(xnE). 

i=i 


The reverse inequality is always true by subadditivity and we get equality, verifying 

(2). 


Finally, we prove that E — JJ Ei is measurable when each Ei is. Taking E[ — 
Ei \ (Ei U . . . U E^ i), (a) tells us it is no loss of generality to assume the sets Ei are 
disjoint, E — |J Ei. Given a test set X C M we know by (c) (finite additivity) and 
monotonicity of u> that 


u{x n Ei) + . . . + lo(x n E k ) + u(x n e c ) 

= lo(X n (El U . . . U E k )) + U>(X n E c ) 

< lo(X n (El U • • • U E k )) + lo(X n (£d U • • • U E k ) c ) 

= ujX. 

Being true for all k, the inequality holds also for the full series 

(X) 

Ei) + oj(xnE c ) < loX. 

i=l 

From (2) we get 

oo 

uj(X n E) + uj(X n E c ) = ^lu{X C\ Ei) + lu{X C\ E c ) < ujX. 

i= 1 

The reverse inequality is true by subadditivity of oj. This gives equality and shows 
that E is measurable. Hence M is a cr-algebra and the restriction of u to M is 
countably additive. □ 


From countable additivity we deduce a very useful fact about measures. It applies 
to any outer measure a;, in particular to Lebesgue outer measure. 

6 Measure Continuity Theorem If {E^} and { F are sequences of measurable 
sets then 

upward measure continuity E^fE^ uE^ f uE 

downward measure continuity f F and ujF\ < oo ujF^ f uoF. 
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Proof The notation t E means that E\ C E 2 C . . . and E = (J E^. Write E 
disjointly as E — |J E' k where E' k = E^\ (E\ U . . . U E^- 1). Countable additivity for 
measurable sets gives 

(X) 

uE = J2 uE ' n - 

n= 1 


Also, the k th partial sum of the series equals uE^, so ujE^ converges upward to ujE. 
The notation f F means that F\ D F2 D . . . and F — H Fk- Write F\ disjointly 
as 

CX) \ 

U K u F 

k = 1 / 



where F k = F^\ i^.+ 1- Then F \ — U n >/c C F. The countable additivity formula 
for measurable sets 

(X) 

ujFi = ujF + T.-K 

71=1 

plus finiteness of cjFi implies that the series converges to a finite limit, so its tails 
converge to zero. That is, 

(X) 

toF k = ^ loF. „ + loF 

n=k 

converges downward to ujF as k — > 00. □ 


3 Meseomorphism 

An isomorphism preserves algebraic structure. A homeomorphism preserves topolog- 
ical structure. A diffeomorphism preserves smooth structure. A “meseomorphism” 
preserves measure structure. More precisely, if M and M' are sets with outer measures 
u and uJ then a meseomorphism is a bijection T : M — > M' such that E 1— > TE 
is a bijection M — > M 7 , where JVC and JVC 7 are the collections of measurable subsets of 
M and M' . If m' (TE) = mE for all measurable E then T is a meseometry. 

7 Theorem If a bijection increases outer measure by at most a factor t and its 
inverse increases outer measure by at most a factor 1 jt then it is a meseomorphism. 
If t — 1 then it is a meseometry. 

Proof Let T : M — >> M f be the bijection where M and M' are equipped with outer 
measures u and u) f . For each X C M we have 


cjX = uj{T ~ 1 o T(X)) < t~ l (jj'{TX) < rHujX = cjA. 
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Thus uo r {TX ) = tuoX , so T multiplies outer measure by t and T 1 multiplies outer 
measure by 1 jt. 

If E C M is measurable then we claim TE is measurable. Let X' be a test set in 
M' . Then X — T _1 (X / ) is a test set in M. Since T multiplies outer measure by T 
and E is measurable we have 

oj'(x') = tuoX = t (cj(x ns) + oj(x n e c )) 

= t (i“ V(T(X H E)) + £ _1 cC(T(X n E c )) 

= uo\X f n TE) + uo\X f n T(E C )). 

Since TE divides each test set X' C M' cleanly, TE is measurable. Likewise for T _1 , 
so E i— > TE bijects M to Mb 

If £ = 1 then T preserves outer measure and therefore it preserves the measure of 
measurable sets. It is a meseometry. □ 


8 Corollary If D is a nonsingular diagonal n x n matrix then the linear map D : 
R n — > R n sending v to Dv is a meseomorphism of Lebesgue measure. If E is mea- 
surable then m[DE ) = |det D\ mE. 


Proof Diagonality implies D carries a box to a box and multiplies its volume by 
d — | det D | . Every covering of A by boxes {Bi} is carried by D to a covering of DA 
by boxes and their total volume gets multiplied by d. Thus D increases 

outer measure by at most the factor d. Similarly, D~ l increases outer measure by at 
most the factor 1/d. Theorem 7 implies that D is a meseomorphism that multiplies 
measure by d. □ 


Affine Motions 

An affine motion of R n is an invertible linear transformation followed by a trans- 
lation. Translation does not affect Lebesgue measure, while Corollary 8 describes how 
a diagonal matrix affects it. 

9 Theorem An affine motion T : R n — > R n is a meseomorphism. It multiplies 
measure by |detT|. 

10 Lemma The boundary of an n-dimensional ball is an n-dimensional zero set. 


Proof We assume n — 2. If A is the closed unit disc in the plane then 0 < mA < oo 
since [— 1/V% 1 / \/2] 2 C A C [ — 1 , 1] 2 . The unit circle C is the boundary of A. It 
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is sandwiched between discs A_ of radius 1 — e and A + of radius 1 + e. Corollary 8 
implies 

m( A_) — (1 — e) 2 mA < mA < (1 + e) 2 mA — ra(A+). 

Measurability implies m( A + \ A_) = m( A+) — m( A_) = 4emA. Since e > 0 is 
arbitrary and mC < m(A + \ A_) we have mC — 0. □ 

11 Lemma Every open cube is a countable disjoint union of open balls plus a zero 
set. 


Proof Let S' C R 2 be an open square. It contains a compact disc A whose area is 
greater than half the area of the square, m( A) > m(S)/2. The difference U\ — S \ A 
is an open subset of S with m(U\) < m(S)/2. It is therefore the disjoint countable 
union of small open squares Si plus a zero set. Each Si contains a small compact 
disc A i whose area is greater than half the area of Si. The total area of finitely many 
of the discs A i is greater than half the total area of the squares S{. Thus, for some 
fc, U 2 — S \ (A U Ai U • • • U A *.) is an open subset of U\ and m(U 2 ) < m(S')/4. See 
Figure 139. Repetition gives countably many smaller and smaller disjoint compact 



Figure 139 Each disc occupies greater than half the area of its square. 

discs with total measure equal to mS. Lemma 10 implies the measure of a closed disc 
is the same as the measure of its interior, which completes the proof that S consists 
of countably many disjoint open discs plus a zero set. □ 
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Proof of Theorem 9 We are assuming that Tv — Mv , where M is an invertible 
n x n matrix. We take n — 2. 


We first claim that if Z is a zero set then so is TZ. Given e > 0 there is a countable 
covering of Z by rectangles R & with total area < e. Each R & can be covered by squares 
with total area < m(Rk) + e/2 k . Hence Z can be covered by countably many squares 
Si with total area < 2e. The T-image of each square Si is contained in a square S[ 
whose edgelength is ||T|| diamSG Thus TZ is contained squares S[ whose total area 
is at most 

^(||T||diam5i) 2 = ^ 2 || r || 2 \Si\ < 4\\T\ 2 


e. 


See Figure 140. Since e > 0 is arbitrary we have ra(TZ) — 0. 



Figure 140 The square S has edgelength t and diameter s — £\[2. Its 
T-image is a parallelogram contained in a square S' of edgelength 
£' = ||T|| 5. Hence m(S') < {£' ) 2 = (||T|| V^£) 2 = 2 ||T|| 2 m(S). 


Next we claim that orthogonal transformations are meseometries. Let O : R 2 
R 2 be orthogonal. It carries the disc B(r,p) to the disc £>(r, Op ), which is a translate 
of B(r,p). Let S' be a square. Lemma 11 implies S = |J Bi U Z where the Bi are 
discs and Z is a zero set. The O-image of each B{ is a disc of equal measure, and the 
O-image of Z is a zero set. Hence m(OS ) = mS. Given e > 0 there is a countable 
covering of A by squares Si with ^ \Si\ < m*A + e. Thus {O(Si)} covers OA and 
has total area < rrOA + e. This implies 


m*(Od) < m*A 
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Since O 1 is also orthogonal, it too does not increase outer measure. Theorem 7 
implies O is a meseometry. 

Finally we use Polar Form (Appendix D in Chapter 5) to write 


M = O 1 DO 2 

where 0\ and O 2 are orthogonal and D is diagonal. Since 0\ and O 2 are meseometries 
and by Corollary 8 D is a meseomorphism which multiplies measure by \det D = 
detT|, the proof is complete. □ 

12 Corollary Rigid motions ofW 1 preserve Lebesgue measure. They are meseome- 
tries. 

Proof A rigid motion is a translation followed by an orthogonal transformation. The 
determinant of an orthogonal transformation is ±1. □ 


The concept of a meseomorphism makes natural sense in a more general context. 
A measure space is a triple (M, M, / 1 ) where M is a set, M is a cr- algebra of subsets 
of M, and [i : M — >► [0, 00 ] has the same basic properties as Lebesgue measure, 
namely, /i(0) = 0, fi is monotone, and p is countably additive. For example, the 
triple (R n ,M(R n ),m) is a measure space, and so is the triple (S' 2 , M(iS 2 ), u) where 
v is surface area on the 2-sphere S 2 . A meseomorphism from one measure space 
(M, M, fi) to another (TV, N, v) is a bijection T : M — > N that bijects M to N 
according to E 1 — > TE. It is a meseometry if in addition we have v(TE) — fiE for all 
E e M. 

Meseometries are not sensitive to topology. See Exercises 19 and 20. 


4 Regularity 

In this section we discuss properties of Lebesgue measure related to the topology of 
R and R n . 


13 Theorem Open sets and closed sets are measurable. 


14 Proposition The half-spaces [a, 00 ) x R n 1 and (a, 00 ) x R n 1 are measurable in 
R n . So are all open boxes. 
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Proof Without loss of generality we assume n — 2. Let H — [a, oo) x R. We claim 
that m*X = m*(X n H) + m*(X n H c ) for all test sets X. Since a xRisa zero set in 
R 2 and zero sets have no effect on outer measure (Theorem 5) we may assume that 
X n (a x R) = 0. Set 


X — {(x, y) G X : x < a} X + = {(x, y) G X : x > a}. 

Then X = X - U X + . Given e > 0 there is a countable covering Ji by rectangles R 
with ^2% \R\ < m*X + e. Let be the collection of rectangles R ± — {(x,y) G R : 
R G and ± (x — a) > 0}. Then 3?^ covers X^ and 


m*X < m*(X H H) + m*(X n i? c ) 

= J2\R\ < m*X + e. 
ft- 


m 

< El fl+ 

x+ 


+ 




Since e > 0 is arbitrary this gives measurability of H = [a, oo) x R. Since the line 
x = a is a planar zero set (a, oo) x R is also measurable. The vertical strip (a, b) x R 
is measurable since it is the intersection 


(a, oo) x R n (— oo, 6) x R 

and (—oo,6) x R = ([6, oo) x R) c . Interchanging the coordinates shows that the 
horizontal strip R x (c, d) is also measurable. The rectangle R — (a, b ) X (c, d) is the 
intersection of the strips and is therefore measurable. □ 

Proof of Theorem 13 Let U be an open subset of R n . It is the countable union 
of open boxes. Since M(R n ) is a cr-algebra and a cr-algebra is closed with respect 
to countable unions, U is measurable. Since a cr-algebra is closed with respect to 
complements, every closed set is also measurable. □ 

15 Corollary The Lebesgue measure of an interval is its length, the Lebesgue mea- 
sure of a rectangle is its area, and the Lebesgue measure of a box is its volume. The 
boundary of a box is a zero set and so is the boundary of a ball. 

Proof This is just Theorem 3, Proposition 14, and measurability of the sets in- 
volved. □ 


Sets that are slightly more general than open sets and closed sets arise naturally. 
A countable intersection of open sets is called a G^-set and a countable union of 
closed sets is an F a - set. (“5” stands for the German word durschnitt and V” stands 
for “sum.”) By De Morgan’s laws, the complement of a G^-set is an Tb-set and 
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conversely. Clearly a homeomorphism sends G^-sets to G^-sets and F^-sets to in- 
sets. Since the cr-algebra of measurable sets contains the open sets and the closed 
sets it also contains the G^-sets and the F^-sets. 

16 Theorem Lebesgue measure is regular in the sense that each measurable set E 
can be sandwiched between an F a -set and a G§-set, F C E C G, such that m(G\F) — 
0. Conversely, if there is such an F C E C G then E is measurable. 

Proof We take E C R 2 . We assume first that E is bounded and choose a large 
rectangle R that contains E. We write E c — R \ E. Measurability implies 

rnR — mE + m(E c ). 

There are decreasing sequences of open sets U n and V n such that U n D F, V n D 
E c , m(U n ) — > mE , and m(V n ) — > m(F c ) as n 4 oo. Measurability of E implies 
m(U n \ E) -n 0 and m(V n \ E c ) — > 0. The complements K n — R \ V n form an 
increasing sequence of closed subsets of E and 

mK n — rnR — mV n -n rnR — rn(E c ) — mE. 

Thus F — U K n is an F^-set contained in F with mE — mE. Similarly, G — [\U n 
is a G^-set that contains F and has mG — mE. Because all the measures are finite, 
the equality mE — mE — mG implies that rn(G \ F) — 0. 

Conversely, if F is an F^-set, G is a G$-set, F C F C G, and m(G \ F) = 0 then 
F is measurable since E — F U Z , where Z — E n (G \ F) is a zero set. 

The unbounded case is left as Exercise 6. □ 

17 Corollary A bounded subset E C R n is measurable if and only if it has a regu- 
larity sandwich FcfJcG such that F is an F^-set, G is a G§-set, and mE — mG. 

Proof If F is measurable, bounded or not, then Theorem 16 implies there is a regu- 
larity sandwich with mE — mE — mG. Conversely, if there is a regularity sandwich 
with mE — mG then boundedness of F implies mE < oo. Measurability of F and G 
imply m(G \ F) = mG — mE — 0 and Theorem 16 then implies F is measurable. □ 

18 Corollary Modulo zero sets, Lebesgue measurable sets are F a -sets and/or G$- 
sets. 


Proof E — FUZ — G\Z' for the zero sets Z — E\F and Z' — G \ E. 


□ 
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Inner Measure, Hulls, and Kernels 

Consider any bounded A C M", measurable or not. in* A is the infimum of the 
measure of open sets that contain A. The infimum is achieved by a G^-set that 
contains A. We call it a hull of A and denote it as Ha- It is unique up to a zero 
set. Dually, the inner measure of A is the supremum of the measure of closed sets 
it contains. The supremum is achieved by an F^-set contained in A. We call it a 
kernel of A and denote it as Ka- It is unique up to a zero set. I We denote the inner 
measure of A as m*A. It equals ttA^Ka)- Clearly nn^A < rrAA and measures A 
from the inside. Also, m* is monotone: A C B implies m*A < m^B. 

Remark Theorem 16 implies that a bounded subset of R n is measurable if and only 
if its inner and outer measures are equal. Lebesgue took this as his definition of 
measurability. He said a bounded set is measurable if its inner and outer measures 
are equal, and an unbounded set is measurable if it is a countable union of bounded 
measurable sets. In contrast, the current definition which uses cleanness and test sets 
is due to Caratheodory. It is easier to use (there are fewer complements to consider), 
unboundedness has no effect on it, and it generalizes more easily to abstract measure 
spaces. 

19 Theorem If A C B C R n and B is a box then A is measurable if and only if it 
divides B cleanly. 

Remark The theorem is also valid for a bounded measurable set B instead of a box, 
but it’s most useful for boxes. It means you don’t need to check clean division of all 
test sets, just clean division of one big box. 

20 Lemma If A is contained in a box B then mB — m^A + m*(F \ A). 

Proof If K C A is closed then B \ K is open and contains B \ A. Measurability 
implies 

mB — mK + m(B \ K ). 

Maximizing mK minimizes m(B\ K) and vice versa. □ 

Proof of Theorem 19 Lemma 20 implies 

m^A + m*(F \ A) = mB. 

Tf A is unbounded we need to take a little more care. It is not enough to achieve the infimum or 
supremum if they are oo. Rather, we demand that Ha is minimal in the sense that if H D A and is 
measurable then Ha \ H is a zero set. Similarly, we demand maximality of Ka in the sense that if 
K C A and is measurable then K \ Ka is a zero set. See Exercise 6. 
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If A divides B cleanly then 


m*A + m*{B \ A) — mB. 

Finiteness of these four quantities permits subtraction, so mn^A — mn*A and A is 
measurable. The converse is obvious because a measurable set divides every test set 
cleanly. □ 


5 Products and Slices 

Regularity of Lebesgue measure has a number of uses such as in Exercises 69, 21, 22, 
23, and 73. Here are some more. 

21 Measurable Product Theorem If A C R n and B C M. k are measurable then 
A x B is measurable and 

m(A x B) — mA • mB. 

By convention 0 • oc = 0 = oc • 0. 

22 Lemma If A and B are boxes then Ax B is measurable andm(AxB) — mA-mB. 

Proof Ax B is a box and the product formula follows from Corollary 15. □ 

23 Lemma If A or B is a zero set then A x B is measurable and m(A x B) = 
mA • mB — 0. 


Proof We assume A, B C R and mA = 0. If e > 0 and £ G N are given then we 
cover A with open intervals A whose total length is so small that the total area of the 
rectangles A x [—£^£\ is < e/2^. The union of all these rectangles covers 4xR and 
has measure < e. The e-Principle implies m*(H x R) = 0. Since ixBcixRit 
follows that Ax B is a zero set. All zero sets are measurable so we have m(Ax B) = 
mA • mB — 0. □ 


24 Lemma Every open set in n-space is a countable union of disjoint open cubes 
plus a zero set. 

Proof Take n — 2, accept all the open unit dyadic squares that lie in [/, and reject 
the rest. Bisect every rejected square into four equal subsquares. Accept the interiors 
of all these subsquares that he in t/, and reject the rest. Proceed inductively, bisecting 
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Figure 141 An open set is a countable union of dyadic cubes. 


the rejected squares, accepting the interiors of the resulting subsquares that he in [/, 
and rejecting the rest. In this way U is shown to be the countable union of disjoint, 
accepted, open dyadic squares, together with the points rejected at every step in the 
construction. See Figure 141. Rejected points of U he on horizontal or vertical dyadic 
lines. There are countably many such lines, each is a zero set, and so the rejected 
points in U form a zero set. □ 


25 Lemma IfU andV are open then U xV is measurable andm(U xV) — mlJ ’mV . 


Proof We assume £/, V Ci Since U x V is open it is measurable. Lemma 24 implies 
that U — Ui h U Zjj and V — Uj Jj U Zy, where I{ and Jj are open intervals while 
Zjj and Zy are zero sets. Then 


U X V = U I i x Jj U Z 
rj 
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where Z — ( Zjj x V) U (U x Zy) is a zero set by Lemma 23. Since 



we conclude that mn(U x V) = mU • mV. □ 

Proof of the Measurable Product Theorem We assume A,B C I are measur- 
able where I is the unit interval. We claim that the hull of a product is the product 
of the hulls and the kernel of a product is the product of the kernels. Since hulls are 
G^-sets their product is a G^-set and is therefore measurable. Similarly, the product 
of kernels is measurable. Clearly 

Ka x K b C Ax B c Ha x H b 

and (Ha x H b ) \ (Ka x K b ) — (Ha \ Ka) x (H b \ A#). Measurability of A and B 
implies m(HA \ ATa) — m (H B \ K B ) — 0 so Lemma 23 gives 

m(KA x Kb) — ™(Ha x iLg). 

Since A x B is sandwiched between two measurable sets of the same finite measure, 
it is measurable and its measure equals their common value. That is, 

(3) m(KA x Kb) — m(A x B) = m(HA x H B ). 

Let U n and V n be sequences of open sets in I converging down to Ha and H B . 
Then U n x L n is a sequence of open sets in / 2 converging down to Ha xH b . Downward 
measure continuity implies m(U n x V n ) — > m(HA x H B ). Lemma 25 implies m(U n x 
V n ) = m(U n ) • m(V n ). Since m(U n ) — > mi and m(V n ) — > mB we conclude from (3) 
that m(A x B) = mA • mB. □ 

Recall from Chapter 5 that the slice of E C M n x at x G R n is the set 

£* = {y € : (x,y) € E 1 }. 

Among other things the next theorem lets us generalize the Measurable Product 
Theorem to nonmeasurable sets. See Exercise 73. 

26 Zero Slice Theorem If E C M n x is measurable then E is a zero set if and 
only if almost every slice of E is a (slice) zero set. 
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Proof As above, it is no great loss of generality to assume n — k — 1 and E is 
contained in the unit square. Suppose that E is measurable and m(E x ) = 0 for 
almost every x. We claim mE — 0. 

Let Z — { x : E x is not a zero set}. Z is a zero set. The slices E x for which E x 
is not a zero set are contained in Z x R which, as proved above, is a zero set in R 2 . 
Then E \ (Z x R) is measurable, has the same measure as E , and so it is no loss of 
generality to assume that every slice E x is a zero set. 

It suffices to show that the inner measure of E is zero. For measurability implies 
m^E — m*E. Let K be any compact subset of E and let e > 0 be given. The slice 
K x is compact and it has slice measure zero. Therefore it has an open neighborhood 
V (x) such that m(V (x)) < e. Compactness of K implies that for all x ' near x we have 
K x t C V(x). For otherwise there is a sequence (x n , y n ) in K with (x n ,y n ) — > (x,y) 
and y ^ K x . Closedness of K implies (x,y) G AT, so y G K x , a contradiction. Hence 
if U(x) is small then for all x r G U(x) we have x' x K x < C W(x) — U(x) x V(x). See 
Figure 142. 



Figure 142 The open set V (x) contains the slice K x and has small 
measure. If x ' lies in a small enough neighborhood U (x) of x then the set 
x' x K x f lies in W(x) — U(x) x V(x). These sets x' x K x > are shown in the 

enlarged picture as vertical segments in K. 
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We can choose these small open sets U ( x ) from a countable base of the topol- 
ogy of R, for instance the intervals with rational endpoints. This gives a countable 
covering of K by thin product sets W{ — Ui x Vi such that m(Vi) < e for each i. We 
disjoint ify the covering by setting 

U[ = Ui \ {U\ U . . . U Ui-i). 

The sets U[ are measurable, disjoint, and since E is contained in the unit square they 
all he in the unit interval. Hence their total one-dimensional measure is < 1. The 
sets W[ = U[ x Vi are disjoint, are measurable, and cover K. Theorem 21 implies 
m(W() — m(U I) • m(Vi) so their total planar measure is < X m (u') ■ e < e. Hence 
mK — 0, which implies m^E — 0 and completes the proof that E is a zero set. 


Conversely, suppose that E is a zero set. Regularity implies there is a G^-set 
G D E with mG — 0 and it suffices to show that almost every slice of G is a zero 
set. The slices of a G^-set are G^-sets and in particular each slice G x is measurable. 
Let X(a) — {x : m(G x ) > a). We claim that m*(X(a)) — 0. Each G x contains a 
compact set K{x) with m(K(x)) — m(G x ). 

Let U be any open subset of 1 2 that contains G. If x E X(a) then x x K(x) is 
a compact subset of U and there is a product neighborhood W(x) — U(x) x V(x) 
of x x K(x) with W(x) C U. Since K(x) C V(x) we have m(V(x)) > a. Again we 
can assume the neighborhoods U (x) belong to some countable base for the topology 
of R. This gives a countable family {Ui} that covers X(a). As above, set U[ — 
Ui \ (U\ U • • • U Ui- 1 ). Disjointness and Theorem 21 imply 

rnU > E m(U'i x Vi) = ■ m(V-) 

— m (Uj) ‘ a ^ ol • m*(X(<a)). 


Since mG — 0 there are open sets U D G D E with arbitrarily small measure. Thus 
X (a) is a zero set and so is U*=N *(!/*)• That is, m(E x ) = 0 for almost every x. □ 


Remark Measurability of E is a necessary condition in Theorem 26. See Exercise 25. 
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6 Lebesgue Integrals 

Following J.C. Burkill, we justify the maxim that the integral of a function is the 
area under its graph. Let / : R — > [0, oo) be given. t 

Definition The undergraph of / is 

Uf = {(x,y) e K x [0, oo) : 0 < y < f(x)}. 

The function / is (Lebesgue) measurable if Uf is measurable with respect to 
planar Lebesgue measure, and if it is then the Lebesgue integral of / is the measure 
of the undergraph 

J f = m(Uf). 



Figure 143 The geometric definition of the integral is the measure of the 

undergraph. 


See Figure 143. 

Burkill refers to the undergraph as the ordinate set of /. The notation for the 
Lebesgue integral intentionally omits the usual u dx ” and the limits of integration to 
remind you that it is not merely the ordinary Riemann integral f ^ f(x) dx or the 
improper Riemann integral f(x) dx. 

Since a measurable set can have infinite measure we permit f f — oo. 

Tn this section we deal with functions of one variable. The multivariable case in which / : M n — >• M 
offers no new ideas, only new notation. 
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Definition The function / ! R y [0, oo) is Lebesgue mtegFcible if (it is measurable 
and) its integral is finite. ^ The set of integrable functions is denoted by L 1 , L 1 , or L. 

The three basic convergence theorems for Lebesgue integrals are the Monotone 
Convergence Theorem, the Dominated Convergence Theorem, and Fatou’s Lemma. 
Their proofs are easy if you look at the right undergraph pictures. We write f n — >* * / 
a.e. to indicate that lim f n (x) = /(x) for almost every x, i.e., for all x not belong- 

n— ?>oo 

ing to some zero set } (See Chapter 3 for previous use of the phrase “almost every” 
in connection with Riemann integrability.) However, we often abuse the notation 
by dropping the “a.e.” for clarity. This is rarely a problem since Lebesgue theory 
systematically neglects zero sets; as Theorem 5 states, zero sets have no effect on 
measurability or measure, and thus no effect on integrals. § 

27 Monotone Convergence Theorem Assume that (f n ) is a sequence of measur- 
able functions f n : R -A [0, oo) and f n f f a.e. as n -A oo. Then 

/W'- 

Proof Obvious from Figure 144. □ 

Definition The completed undergraph of/ : R -A [0, oo) is 

Uf = {(x, y) E R x [0, oo) : 0 < y < f(x)}. 

It is the undergraph plus the graph. 

28 Proposition 1 if is measurable if and only if Ilf is measurable, and if measurable 
then their measures are equal. 

Proof For n G N let T± n : R 2 -A R 2 send (x,y) to (x, (1 ± l/n)y). The matrix that 
represents T± n is 

1 0 

0 l=bl/n 

Mhus the integral of a measurable nonnegative function exists even if the function is not in- 
tegrable. To avoid this abuse of language the word “summable” is sometimes used in place of 
“integrable” to indicate that f f < oo. 

*You may also come across the abbreviation “p.p.” for the French presque partout. 

^As informal notation one might try decorating the standard symbols “V”, etc. with 

small zeros indicating “up to a zero set.” Thus f n A / would indicate a.e. convergence, A = B 

o 

would indicate set equality except for a zero set, V would indicate for almost every, and so on. But 

o 

really, would you benefit very much from formulas like f n A f<g ? 
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Figure 144 f n f / implies Uf n f hi/. Upward measure continuity 
(Theorem 6) then implies f f n — m(Uf n ) / m(Uf ) = f f. 

By Corollary8 T± n is a meseomorphism and m{T n (Uf)) = (1 + l/n)m(Uf). The 
intersection n T n (Uf ) is 1 If except for points (x, 0) of the x-axis at which /(x) = 0. 
The x-axis is a planar zero set and has no effect on measurability. Therefore Uf is 
measurable. 

Similarly, Uf is the union of the sets T_ n (U/) except for points on the x-axis 
and so measurability of Uf implies measurability of Uf. Upward measure continuity 
implies that 

rn(Uf ) — lim (1 — 1 /n)rn(Uf ) — rn(Uf) 

n— ^ oo 

which completes the proof. □ 

29 Corollary If (f n ) is a sequence of integrable functions that converges monotoni- 
cally downward to a limit function f almost everywhere then 

jfnijf- 

Proof Since m(U(f n )) = f f n is finite, downward measure continuity is valid. Propo- 
sition 28 then implies 

J fn = m(U(fn )) = m(U(f n )) f m(Uf) = m(Uf) = J f 

as n — ^ oo. □ 

Definition If f n : X — > [0, oo) is a sequence of functions then the lower and upper 

envelope sequences are 

f_ n (x) = mi{f k (x) : k > n} J n (x) = sup {fk(x) : k>n}. 

We permit f n {x) — oo. 
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30 Proposition 1 i(/ n ) = U U(f k ) and U(f_ n ) = fl U(f k ). 

k>n k>n 

Proof We have 

(x,y) E 11(70 <=> y < sup {f k (x) : k > n} 

<^=0> 3 £ > n such that y < fe(x) 

<=> 37? > n such that (x,y) G VL(fi) 

«=*> (x,y) E U u(/fc)- 

k>n 

The other equality is checked the same way. □ 

31 Dominated Convergence Theorem If f n : R — > [0, oo) zs a sequence of mea- 
surable functions such that f n — > / a.e. and if there exists a function g : M — > [0, oo) 
whose integral is finite and which is an upper bound for all the functions f n then f 
is integrable and f f n — > f f as n — > oo. 

Proof Obvious from Figure 145. □ 



Figure 145 Dominated convergence. Proposition 30 implies the envelope 
functions are measurable. Due to the dominator g they are integrable. The 
Monotone Convergence Theorem and Corollary 29 imply their integrals 
converge to f f. Since U(f ) C U(/ n ) C U(/ n ) the integral of f n also 

converges to f f . 
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Remark If a dominator g with finite integral fails to exist then the assertion fails. 
For example, the sequence of steeple functions shown in Figure 89 on page 214, have 
integral n and converge at all x to the zero function as n — > oo. See Exercise 33. 


32 Corollary The pointwise limit of measurable functions is measurable. 


Proof 1 X(/ ) is measurable and converges upward to If/. □ 

33 Fatou’s Lemma If f n : R. — >> [0, oo) is a sequence of measurable functions then 

J liminf f n < liminf J f n . 


Proof The assertion is really more about liminfs than integrals. The liminf of the 
sequence (f n ) is f — lim / , where / is the lower envelope function. Since / / /. 

n — ^oo — 71 — 71 ” 


—n 


the Monotone Convergence Theorem implies f f f J /, and since / < f n we have 
/ / < liminf / f n . □ 


Remark The inequality in Fatou’s Lemma can be strict as is shown by the steeple 
functions. See Exercise 33. 


Having established the three basic convergence theorems for Lebesgue integrals 
using mainly pictures of undergraphs, we collect some integration facts of a more 
mundane character. 

34 Theorem Let f,g: R — ^ [0, oo) be measurable functions. 

(a) If f <g then f f < j g. 

(b) IfR — UkLi X k an d each is measurable then 

/ OO 

/ - E / /■ 

it. i 

(c) If X C R is measurable then mX — f Xx- 
(d) If mX — 0 then f x f = 0. 

(e) If f(x) — g(x) almost everywhere then J f = f g. 

(f) If C> 0 then j cf = cf f. 

(g) The integral of f is zero if and only if f(x) = 0 for almost every x. 

(h) ff + 9 = ff + f9- 


Proof Assertions (a) - (g) are obvious from what we know about measure. 
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(a) / < g implies 1 if C Ug implies m(Uf ) < m(Ug). 

(b) The product X & x R is measurable and its intersection with 1 If is VLf\x k - Thus 
Uf — U^i bl/ \x k and countable additivity of planar measure gives the result. 

(c) The planar measure of the product 1 L(Xx) — X x [0, 1) is ml. 

(d) Uf is contained in the product X x R of zero planar measure. 

(e) Almost everywhere equality of / and g means there is a zero set Zcl such 
that if x Z then f[x) — g(x). Apply (b), (d) to R = Z U (R \ Z). 

(f) According to Theorem 9 scaling the y - axis by the factor c scales planar measure 
correspondingly. 

(g) The Zero Slice Theorem (Theorem 26) asserts that Uf is a zero set if and only if 
almost every vertical slice is a slice zero set. The vertical slices are the segments 
[o Jx). 

(h) This requires a new concept and a corresponding picture. See Theorem 35, 

Corollary 36, and Figure 146. □ 


Definition If / : R — > R then /-translation Tf : R 2 R 2 sends the point (x, y) to 
the point (x, y + f(x)). 


Tf slides points along the vertical lines x x R and 

f °Tg — Tf + g — Tg oTf 
so Tf is a bijection whose inverse is 

35 Theorem If f : R — > [0, oo) is integrable then Tf preserves planar Lebesgue 
measure ; i.e., it is a meseometry. 


Proof We must show that Tf bijects the class M of Lebesgue measurable subsets of 
R 2 to itself and m(TfE ) = mE for all E G JVC. 

Consider Figure 146. It demonstrates that for any two nonnegative functions on 
R we have two ways to express U(f + g), namely 


UfUT f (Ug) = U(f + g ) = T g (Uf)uUg. 
First we consider the function 


I h if x G I 

g{x) = < 

0 otherwise 


where I is an interval in R and h is a positive constant. See Figure 147. The un 
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Figure 146 The undergraph of a sum 

dergraph of g is the rectangle R — I x [0, h). The Tj-image of R is the same as the 
Tfj- image, where fj(x ) = f(x ) • Xi(x). Thus we can assume that f{x) — 0 for x ^ I. 
The map T g is vertical translation by the constant h and since Lebesgue measure is 
translation invariant we get measurability of T g (Uf). Then 1 if U TfR — T g (Uf) U R 
implies TfR is measurable and 

m(Uf) + m(TfR) — m(T g (If/) ) + mR. 

Since m(Uf ) < oo, subtraction is legal and we get m(TfR ) = mR. If we translate R 
vertically by k then we have a rectangle T^R — I x[k,h + k) and Tf(T^R) — T \ °TfR 
implies that Tf sends each rectangle Ix[c, d ) to a measurable set of the same measure. 

We claim that Tf never increases outer measure. If S C R 2 and e > 0 is given 
then we cover S with countably many rectangles Ri such that 

m(Ri) < m*S + e. 

Then TfS is covered by countably many measurable sets Tf(Ri) with total measure 
< m*iS + e. From countable subadditivity and the e-Principle we deduce m*{TfS) < 
m*iS. The same is true for T_f since 

/ f— ijjoTfOljj 

where ip : R 2 — > R 2 is the meseometry sending (x,y) to (x, — y). Neither Tf nor its 
inverse increase outer measure, so Theorem 7 implies Tf is a meseometry. □ 
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Figure 147 Tf translates R upward by / and T g 




36 Corollary If f : M — > [0, oo) and g : R — > [0, oo) are integrable then 

Jf + g = Jf + Jg- 

Proof Since 1 L(f + g) = 11/ U Tf(VLg) and Tf is a meseometry we see that f + g is 
measurable and m(U(f + g)) — m(Uf ) + m(Ug). That is, the integral of the sum is 
the sum of the integrals. □ 

Remark The standard proof of linearity of the Lebesgue integral is outlined in Ex- 
ercise 47. It is no easier than this undergraph proof, and undergraphs at least give 
you a picture as guidance. 

37 Corollary If f k : R — > [0, oo) is a sequence of integrable functions then 

oo „ « oo 

E/a = /E a- 

k = 1 J J k= 1 

Proof Let F n {x) — Ylk=i fk{ x ) be the n th partial sum and F(x) — YlkLifk( x )- 
Then F n (x) t F(x) as n oo. The Monotone Convergence Theorem implies f F n — >> 
J F. Corollary 36 implies Ylk=i I fk — fJ2k=i fk an d the assertion follows. □ 


Until now we have assumed the integrand / is nonnegative. If / takes both 
positive and negative values we define 


/+(*) 


f(x) if f(x) >0 f -f{x) if f{x) < 0 

0 if f{x ) <0 y 0 if fix ) > 0. 
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Then f± > 0 and / = /+ — /_. See Exercise 28. If f± are integrable we say that / is 
integrable and define its integral as 

I f = I u - S f - 

38 Proposition The set of measurable functions f : R — > R is a vector space , the 
set of integrable functions is a subspace, and the integral is a linear map from the 
latter into R. 

The proof is left to the reader as Exercise 32. 


7 Italian Measure Theory 

In Chapter 5 the slice method is developed in terms of Riemann integrals. Here we 
generalize to Lebesgue integrals. If E C R fc x R n and x G R fc then the x-slice through 
a point x G R fc is 

E x = {y G R n : (x,y) G E}. 

The y-slice is E y — {x : (x, y ) G E}. Similarly, the x-slice and y- slice of a function 
/ : E R are / x : y ^ f(x, y) and f y :x^ f(x, y). 

Remark In this section we frequently write dx and dy to indicate which variable is 
the integration variable. 

39 Cavalieri’s Principle If E is measurable then almost every slice E x of E is 
measurable, the function x i— > m{E x ) is measurable, and its integral is 

(4) m E — J m[E x ) dx. 

(Note that mE refers to (k + n)- dimensional measure while m(E x ) refers to n- 
dimensional measure.) 

See Figure 148. 

Proof We take k — 1 = n. The proof of the Zero Slice Theorem (Theorem 26) 
contains the hard work; if E is a zero set then it asserts that almost every slice E x 
is a zero set, and since the integral of a function that vanishes almost everywhere is 
zero we get (4) for zero sets. 
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Figure 148 Slicing a planar set 


(4) is obvious for boxes, and hence it holds also for open sets. After all, an open 
set is the disjoint union of boxes and a zero set, and slicing preserves disjointness. 

The Dominated Convergence Theorem promotes (4) from open sets to bounded 
G^-sets. 

(4) holds for bounded measurable sets since each is a bounded G^-set minus a 
zero set. The general measurable set E is a disjoint union of bounded measurable 
sets, E — [}Ei, so countable additivity gives (4) for E. □ 

The proof of Cavalierrs Principle in higher dimensions differs only notationally 
from the proof in R 2 . See also Appendix B of Chapter 5 and Exercise 44. 

40 Corollary The y-slices of an undergraph decrease monotonically as y increases, 
and the following formulas hold: 

c u/r - u my ( urn = n m y . 

y>a y<a 

Every horizontal slice of a measurable undergraph is measurable. 

Proof Monotonicity and the formulas follow from 

(lif) a — {x : a < fx} — {x : By > a such that y < fx} 

(Uf) a = {x : a < fx} — {x : My < a we have y < fx}. 
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We fix an arbitrary a and ask: Are the slices ( Uf) a and ( Uf) a measurable? Cava- 
lieri’s Principle implies that almost every horizontal slice of a measurable undergraph 
is measurable. Thus, there exist y n f a such that (1 lf) Vn is measurable. By mono- 
tonicity, ( Uf) a — U n 0^f) Vn gives measurability of (Uf) a . Similarly for the completed 
undergraph. □ 


41 Corollary Undergraph measurability is equivalent to the more common definition 
using preimages. 


Proof We say that / : R — > [0, oo) is preimage measurable if for each a G [0, oo) 
the preimage / pre [a, oo) = {x : fx > a} is a measurable subset of the line. (See also 
Appendix A.) Since 


/ pre [a, oo) = {x : a < fx} = ( Uf ) 


a 


by Corollary 40, we see that undergraph measurability implies preimage measurabil- 
ity. The converse follows from the equation 

Uf — U / pre [a, oo) x [0,a). 

0<aOQ □ 


As a consequence of Cavalierhs Principle in 3-space we get the integral theorems 
of Fubini and Tonelli. It is standard practice to refer to the integral of a function / 
on R 2 as a double integral and to write it as 

j f = J J f(x, y) dxdy. 


It is also standard to write the iterated integral as 


fx(y ) dy 


dx — 


f(x,y)dy 


dx, 


42 Fubini- Tonelli Theorem If f : R 2 — > [0, oo) is measurable then almost every 
slice f x (y ) is a measurable function of y, the function x \ f f x (y) dy is measurable, 
and the double integral equals the iterated integral, 


[f f(x, y) dxdy = 


f(x,y ) dy 


dx. 


Proof The result follows from the simple observation that the slice of the undergraph 
is the undergraph of the slice, 


m 


X 



( 5 ) 
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See Figure 149. For (5) implies that rri 2 ((VLf) x ) — m 2 O^fx) — f f( x ,y) dy > and then 
Cavalier i gives 



f(x, y) dxdy = m 3 (Uf) = J [m 2 {(Uf) x )\ dx 


f(x,y)dy 


dx. 


□ 


43 Corollary When f : R 2 — > [0, oo) is measurable the order of integration in the 
iterated integrals is irrelevant, 


f(x,y)dy 


dx — 



f(x,y)dxdy 


f( x , y) dx 



(In particular if one of the three integrals is finite then so are the other two and all 
three are equal.) 


Proof The difference between u x” and “y” is only notational. In contrast to the 
integration of differential forms, the orientation of the plane or 3-space plays no role 
in Lebesgue integration so the Fubini-Tonelli Theorem applies equally to x-slicing 
and y-slicing, which implies that both iterated integrals equal the double integral. □ 
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The multidimensional version of Cavalierrs Principle yields similar multi-integral 
results. See Exercise 54. 

When / takes on both signs a little care must be taken to avoid subtracting oo 
from oo. 


44 Theorem If f : R 2 — > R is integrable ( the double integral of f exists and is finite) 
then the iterated integrals exist and equal the double integral. 


Proof Split / into its positive and negative parts, / = /+ — /_, and apply the 
Fubini-Tonelli Theorem to each separately. Since the integrals are finite, subtraction 
is legal and the theorem follows for /. □ 


See Exercise 53 for an example in which trouble arises if you forget to assume 
that the double integral is finite. 


8 Vitali Coverings and Density Points 

The fact that every open covering of a closed and bounded subset of Euclidean space 
reduces to a finite subcovering is certainly an important component of basic analysis. 
In this section we present another covering theorem, this time the accent being on 
disjointness of the sets in the subcovering rather than on finiteness. The result is 
used to differentiate Lebesgue integrals. 

Definition A covering V of a set A in a metric space M is a Vitali covering if for 
each point p G A and each r > 0 there is V G V such that pGbC M r p and V is not 
merely the singleton set {p}. 


For example, if A = [a, 6], M = R, and V consists of all intervals [o, /3] with a < f3 
and cp/3 E Q then V is a Vitali covering of A. 


45 Vitali Covering Lemma A Vitali covering of a bounded set A C R n by closed 
balls reduces to an efficient disjoint subcovering of almost all of A. 


More precisely, given e > 0 , V reduces to a countable subcollection {Vf\ such that 

(a) The are disjoint. 

(b) mil < rn*A + e, where U — UfcLi Vk- 

(c) A \ U is a zero set. 
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Condition (b) is what we mean by {14} being an “efficient” covering - the extra 
points covered form an e-set. The sets Un = V\ U • • • U Vjy “nearly” cover A in the 
sense that given e > o, if N is large then Ujy contains A except for an e-set. After 
all, U — U Un contains A except for a zero set. See also Appendix E. 

Boundedness of A is an unnecessary hypothesis. Also, the assumption that the 
sets V <E V are closed balls can be weakened somewhat. We discuss these improve- 
ments after the proof of the result as stated. 

Proof of the Vitali Covering Lemma Given e > 0, there is a bounded open set 
W D A such that mW < m*A + e. Define 

Vi = {V G V : V C W} and d x = sup{diam V : V G Vi}. 

Vi is still a Vitali covering of A. Since W bounded d\ is finite. Choose V\ G V\ with 
diam V\ > d\/2 and define 

V 2 = {V G Vi : V Cl V\ — 0} and g?2 = supjdiam V : V G V 2 }. 

Choose V 2 G V 2 with diamV 2 > ^ 2 / 2 . In general, 

Vfc = {V G : V Cl Uk-i = 0} 
dk — supjdiam V : V G V 

Vk G Vk has diam Vk > — 

where Uk - 1 = V\ U . . . U This means that Vk has roughly maximal diameter 

among the V G V that do not meet Uk- 1 - By construction, the balls Vk are disjoint 
and since they he in W we have m (UU) < mW < rrAA + e, verifying (a) and (b). 
It remains to check (c). 

If at any stage in the construction Vk = 0 then we have covered A with finitely 
many sets V&, so (c) becomes trivial. We therefore assume that Vi, V 2 , . . . form an 
infinite sequence. Additivity implies that m(|J V/c) — Since each Vk is 

contained in W the series converges. This implies that diam Vk — > 0 as k — > 00 ; i.e., 

(6) dfc — y 0 as h — y 00 . 


For each V G N we claim that 

(X) 

(7) U D A\U n _ 1 

k=N 

where 514 denotes the ball Vk dilated from its center by the factor 5. (These dilated 
balls need not belong to V.) 
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Take any a G A\ Ujsr-i. Since Ujst-i is compact and Vi is Vitali, there is a ball 
B E V i such that a G B and BnU n~i — 0- That is, B E V]y. Assume that (7) fails. 
Then, for all k > N we have 

a g 5V k . 

Therefore B (jL 5 Vjy. Figure 150 shows that due to the choice of Vjy with roughly 
maximal diameter, the fact that 5 Vjy fails to contain B implies that Vjy is disjoint 
from B, so B E V/v+i- This continues for all k > N; namely for all k > N we have 
BeV k . 



Figure 150 The unchosen ball B 


Aha! 

B was available for choice as the next V k , k > N, but it was never chosen. 
Therefore the chosen V k has a diameter at least half as large as that of B. The latter 
diameter is fixed, but (6) states that the former diameter tends to 0 as k — > oo, a 
contradiction. Thus (7) is true. 

It is easy to see that (7) implies (c). For let 5 > 0 be given. Choose N so large 
that 

OO r- 

J2 m{ y fc) < 

k=N 
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where n — dimR n . Since the series ^ra(Vfc) converges this is possible. By (7) and 
the scaling law mn{tE ) — t n mE for n-dimensional measure we have 

(X) (X) 

m*(A\U N - 1 ) < ^m(5V) = 5 n m{V k ) < 8. 

k=N k=N 

Since 5 is arbitrary, A\U = (] k (A \ Uk) is a zero set. □ 

Remark A similar strategy of covering reduction appears in the proof in Chapter 
2 that sequential compactness implies covering compactness. Formally, the proof is 
expressed in terms of the Lebesgue number of the covering but the intuition is this: 
Given an open covering If of a sequentially compact set iF, you choose a subcovering 
by first taking a U\ G 'll that covers about as much of K as possible, then taking 
U 2 G If that covers about as much of the remainder of K as possible, and so on. 
If finitely many of these sets U n fail to cover K then you take a sequence x n G 
K \ {Ui U • • • U U n - 1 ) and prove that it has no subsequence which converges in K. 
(The contradiction shows that in fact finitely many of the U n you chose actually did 
cover K .) In short, when reducing a covering it is a good idea to choose the biggest 
sets first. This is exactly the Vitali outlook. 

Removing the assumption that A is bounded presents no problem. Express R n 
as LI Di U Z, where the Di are the open unit cubes defined by the integer lattice and 
Z is the zero set of hyperplanes having at least one integer coordinate. If A C R n is 
unbounded then A — |J Ai U (A n Z\ where Ai — An Di. Given a Vitali covering V 
of A by closed balls, we set 


V, = {V G V : V C Di}. 


It is a Vitali covering of the bounded set Ai and therefore reduces to a disjoint (e/2 2 )- 
efficient covering { Vi ^ : fc G N} of almost all of Ai. Thus V reduces to a disjoint 
e-efficient covering {V^k : z, k G N} of almost all of A. 

A further generalization involves the shapes of the sets V G V. If 
on R n then its closed ball of radius r at p is 


* is any norm 


B*(r,p) = {x G M n 


X 


* 


< r}. 


The preceding proof of the Vitali Covering Lemma goes through word for word when 
we substitute balls with respect to the norm | | for Euclidean balls. Even the factor 
5 remains the same. If | 

See also Exercise 61. 


* 


is the taxicab norm then this gives the following result. 
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46 Vitali Covering Lemma for Cubes A Vitali covering of A C R n by closed 
cubes^ reduces to an efficient disjoint subcovering of almost all of A. 


Density Points 


Let E C R n be measurable. For p E R n 

5(p, E ) — lim 

Q\x 


define the density of E at p as 

m(E n Q ) 
mQ 


if the limit exists, m being Lebesgue measure on R n . The notation Q f p indicates 
that Q is a cube which contains p and shrinks down to p. It need not be centered 
at p. Clearly 0 < S < 1. Points with 5 — 1 are called density points of E. The 
fraction that we’re taking the limit of is the “relative measure” or concentration of 
E in Q. I like to write the concentration of E in Q as in chemistry, 


m{E n Q) 
mQ 


[E : Q}. 


Existence of 5(p, E) means that for each e > 0 there exists an £ > 0 such that if Q is 
any cube of edgelength < t that contains p then the concentration of E in Q differs 
from 5(p, E ) by < e. 


Remark Demanding that that the cubes be centered at p produces the concept of 
balanced density. Balls or certain other shapes can be used instead of cubes. See 
Exercise 58, Exercise 61, the end of the preceding section, and Figure 151. 


47 Lebesgue Density Theorem If E is measurable then almost every p E E is a 
density point of E. 


Interior points of E are obviously density points of E , although sets like the 
irrationals or a fat Cantor set have empty interior, while still having plenty of density 
points. 

Proof of the Lebesgue Density Theorem Without loss of generality we assume 
E is bounded. Take any a, 0 < a < 1, and consider 

E a = {p e E : 8(E, p) < a} 

' The cubes are Cartesian products Ii x • • • x where the U are closed intervals, all of the same 
length. 
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Figure 151 An artist’s rendering of a density point 

where 6 is the lower density, liminf[l? : Q\. We will show that E a has outer measure 

QIp 

zero. 

By assumption, at every p G E a there are arbitrarily small cubes in which the 
concentration of E is < a. These cubes form a Vitali covering of E a and by the 
Vitali Covering Lemma we can select a subcollection Qi, Q 25 • • • such that the Qk are 
disjoint, cover almost all of E a , and nearly give the outer measure of E a in the sense 
that 

< m*(E a ) + e. 

k 

(E a turns out to be measurable but the Vitali Covering Lemma does not require us 
to know this in advance.) We get 

m*{E a ) = ^2 m *( E a H Qk) 

k 

< ym(£nft) < a m(Qk) < a(m*(E a ) + e) 

k k 

which implies that m*(E a ) < ae/ (1 — a). Since e > 0 is arbitrary we have m*(E a ) = 0. 
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The E a are monotone increasing zero sets as a f 1- Letting a — 1 — l/£ with 
£ — 1,2,..., we see that the union of all the E a with a < 1 is also a zero set, say Z. 
Points p G E \ Z have the property that as Q l p, the lim inf of the concentration 
of E in Q is > a for all a 1. Since the concentration is always ^ 1 this means 
the limit of the concentration exists and equals 1 for all p G E \ Z\ i.e., almost every 
point of E is a density point of E. □ 


48 Corollary If E is measurable then for almost every p we have 


Xe(p) = lim [E : Q}. 
QIp 


Proof For almost every p G E we have lim[i? : Q\ = 1 and for almost every q G E c 

QIp 

we have lim [FT : Q] — 1. Measurability of E implies [E : Q\ + [ E c : Q] — 1, which 
Qiq 

completes the proof. □ 


A consequence of the Lebesgue Density Theorem is that measurable sets are not 
“diffuse” - a measurable subset of R can not meet every interval (a, b ) in a set of 
measure c • (6 — a) where c is a constant, 0 < c < 1. Instead, a measurable set must 
be “concentrated” or “clumpy.” See Exercise 56. Also, looking at the complement 
E c of E, we see that almost every point x G E c has 5(E,x) — 0. Thus, almost every 
point of E is a density point of E and almost every point of E c is not. 

Think of the set of density points of E as the measure-theoretic interior of E, 
the set of density points of E c as the measure-theoretic exterior of E, and the 
remaining set as the measure-theoretic boundary of E. We denote the last set as 
d m (E). Regularity of Lebesgue measure and the Lebesgue Density Theorem imply 
that measurability of E is equivalent m(d m (E )) = 0. 

As you might expect, Cavalierhs Principle meshes well with density points. Recall 
that the slice of the undergraph is the undergraph of the slice, 

(It/), = Uf x (It ff = up, 

where f x (y ) = f(x,y ) = f y (x). 

49 Theorem Density points slice well. 


Proof We assume that / : R n — >► [0, oo) is measurable and (p, y) G Uf has y > 0. 
Figure 152 shows that (p, y) is a density point of Uf if and only if p is a density point 
of U(f y ). □ 
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Figure 152 The undergraph IX/ consists of segments x x [0, fx ). In order 
that the union of these segments has high concentration in Q+, the 
segments must first cross the bottom face of Q+, namely Q x {y}, with 
high concentration there. Similarly, if they reach Q x {y} with high 
concentration then they first cross Q- with high concentration. 


50 Corollary ( dp(Uf ) n 1 lf) y — dp(Uf y ) n Uf y . 


Proof dp(Uf ) refers to the (n + l)-dimensional density points of Uf while dp(Uf y ) 
refers to the n-dimensional density points of Uf y . The proof is left as Exercise 52. □ 
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9 Calculus a la Lebesgue 

In this section we write the integral of / over a set A as f A f(pc) dm. In dimension 1 
we write it as f A f(t ) dt or as J ^ f(t ) dt when A — (cp /3). 


Definition The average of a locally integrable function / : R n — > R over a mea- 
surable set A C R n with hnite positive measure is 



f(x) dm 


1 

mA 



f(x) dm. 


By “locally integrable” we mean “integrable on a small enough neighborhood of each 
point in R n .” One can also write the average of / over A as [/ : A\. If Xe is the 
characteristic function of E then [Xe '• A] = [E : A\. 


The following result is also called Lebesgue’s Fundamental Theorem of Cal- 
culus. 


51 Average Value Theorem If f : R n — > R is locally integrable then for almost 
every p G R n we have 


lim 4- f{x) dm 
QIp Jq ^ 



5 


where Q Ip means that Q is a cube which contains p and shrinks down to p. 


52 Lemma If g : R n — > [0, oo) is integrable then for every a > 0 the set X(a,g) — 

{p : lim sup 4- g > a} has outer measure 
QIp j q 

m*(X(g,a)) < - [ g. 

a J 

Proof The set X(a,g) is covered by arbitrarily small cubes on which the average 
value of g exceeds a. By Vitali’s Covering Lemma we have 

UQi 3 X(g,a) 

up to a zero set, where the average of g on Qi is > a. Hence a • m{Qf) < / g and 

4 Qi 

a ■ m*(X(g, a)) < ^a-m(Qi) < E J g - J g ' 

Dividing the first and last terms by a gives the assertion. □ 
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Proof of the Average Value Theorem Since / is locally integrable, R n is cov- 
ered by open sets on which / is integrable. It follows that / is integrable on each 
compact cube in R n . Since R n is the monotone union of cubes of integer radius, it is 
no loss of generality to assume / is integrable on some large cube X and identically 
zero outside X. 


Fix a > 0. Theorem 49 implies that almost every point p in every horizontal slice 
of Uf is a density point of the slice. As Q fp the concentration of {x : fx > fp — a} 

in Q converges to 1, which implies liminf 4 f > fp — a. Since this is true for each 

QIp JQ 

a = 1, 1/2, 1/3, ... we have 


lim inf 4 - f > fp 

Qip Jq 


almost everywhere. 


To handle the lim sup we first assume / is bounded, say f(x ) < M for all x G X. 
Then M — f > 0 is integrable on X and /g(Af — f) = M — £q f. Thus 


lim inf 4 - (M — /) > M — fp 
QIp Jq 

for almost every p G R n . The relation between liminf and limsup gives 

lim sup f — Ihn sup + (/ — M ) + M 

QIp J Q Qlp J q 

— — lim inf J- (M — f) + M < fp 
Qip Jq 


which gives 

lim / / = fp 
QIpjq 

for almost every p when / is bounded. 

For the general integrable / : X — > [0, oo) we set 



f(x) if f(x) < n 
n if f(x) > n 


Then f n is bounded and f n t / as n — > oo. Accordingly for each n there is a zero 

set such that for all p ^ Z n we have lim 4 - f n — f n (p ). Let be the zero set 

Q^pJq 

U Z n . If p ^ Z 0 o then for all n G N we have 

lim/ fn = fn(p)‘ 

QIpJq 
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The function g n — f — f n is nonnegative, integrable, and g n f 0 as n — > oo. We 
hx a > 0 and apply Lemma 52 to g n . The Dominated Convergence Theorem implies 
f 9n — 0 and we get 


m*(X(a,g n )) < 


1 

a 


g n — > 0 as n -G oo 


where X(a,g n ) — {p G R n : limsup-f > o}. These sets nest downward as n 

increases, so downward measure continuity implies that their intersection is a zero 
set Z(a) = fl n X(a,g n ). 


Consider each p ^ Z(a) U Z^. Since p ^ Z(a) there is some n such that p ^ 
X(a,g n ). Hence 

limsup 4 g n < a. 

Qip Jq 

Since p ^ Z^ the average of f n over Q converges to f n p as Qlp. Thus 


lim sup + f < lim sup 4 - f n + lim sup 4 Qn 
QiP JQ Qip Jq Qip Jq 

< fnP + OL < fp + a. 


The union of the sets Z(a) with a = 1, 1/2, 1/3, . . . is a zero set Z<j. Thus, if 
p ^ Zq U Zoo then for all k G N we have 


fp < lim inf 4 f < lim sup 4 f < fp + — 

Qip Jq Qip Jq k 

from which it follows that for almost every p G R n the average of / over Q converges 
to fp as Q f p. □ 


53 Corollary If f : [a, b] -G R is Lebesgue integrable and 


F(x) = f f(t)dt 

J a 


is its indefinite Lebesgue integral then for almost every x G [a, b] the derivative F\x ) 
exists and equals f(x). 


Remark Here and below the domain of our function is R and we make essential use 
of its one-dimensionality. 
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Proof In dimension 1, a cube is a segment, so Theorem 51 gives 


F{x + h) — F{x) 
h 


[x,x-\-h] 


f(t) dt -s- f{x ) 


almost everywhere as h f 0. The same holds for [x — h, x 


□ 


Corollary 53 does not characterize indehnite integrals. Mere knowledge that a 
continuous function G has a derivative almost everywhere and that its derivative is 
an integrable function / does not imply that G differs from the indefinite integral of 
/ by a constant. The Devil’s staircase function H is a counterexample. Its derivative 
exists almost everywhere, H'{x) is almost everywhere equal to the integrable function 
f(x) — 0, and yet H does not differ from the indefinite integral of 0 by a constant. 
The missing ingredient is a subtler form of continuity. 


Definition A function G : [a, b] — > R is absolutely continuous if for each e > 0 
there exists 5 > 0 such that whenever A, . . . , I n are disjoint intervals in [a, b } we have 


n 


n 


Y^k-ai < 5 


Y\G(bi) - G( ai )\ < e 


i — 1 


i=l 


54 Proposition Every absolutely continuous function is uniformly continuous. If 
{If) is a sequence of disjoint intervals {ai,bf C [a, b] then the following are equivalent 
for a function G : [a, b] — > R. 


n 


n 


(a) Ve>035>0 such that bf — cq < 5 => | G{bfj — G{af)\ < e. 


i=i 

oo 


1=1 

oo 


(b) Ve>035>0 such that bf — ai < 5 => | G{bfj — G{af)\ < e. 


%=i 

n 


i=l 

n 


(c) Me > 0 3 5 > 0 such that m(A) <5 => m{G{Iif) 


< 6 . 


i=l 

oo 


1=1 

oo 


(d) Me > 0 3 5 > 0 such that m(A) <5 => m{G{Iif) < e. 


i= 1 


i=l 


Also, if G is absolutely continuous and Z is a zero set then GZ is a zero set. Finally, 
if G is absolutely continuous and e > 0 is given then there exists S > 0 such that if E 
is measurable then GE is measurable and mE < 5 m(GE ) < e. 

Proof Assume G is absolutely continuous. For each e > 0 there exists 5 > 0 such 
that if bi — ai <5 then \ G{bf — G{ Oi) I < e - Apply this with just one interval 
{t,x). Then \t — x\ < S implies | G{t) — G{x)\ < e, which is uniform continuity. 
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(a) (b). (a) is the definition of absolute continuity. In the definition take e/2 in 

place of e. The resulting 5 depends on e but not on n. Thus h — a i < 5 implies 
Y!i=i bi~ di < 5 implies YJi=i I G(bi) - G(a*)| < e/2 implies I G{fii) - G(di)\ < 
e/2 < e, which is (b). 


(b) (c). m(G(Ii )) = | G(ti) — G(si ) |, where G(ti ) and G(si) are the maximum 

and minimum of G on [a^, bj\. Let Ji be the interval between and Then J{ C fi 
implies m(J^) < m(C) implies m (Ji) < $ implies Y^7=i\G(U) ~ G( s i)\ < e. 

Thus YJi = i I G(ti) - G(si)| = YZ= l rn(G(Ji)) < e, which is (c). 

(c) (d). This is just like (a) (b). 

(d) (a). Since m(Ii) — bi — ai and \G(bi) — G{ai)\ < m(G(Ii)) this is immediate. 

Assume Z C [a, b] is a zero set and G is absolutely continuous according to (d). 
For each e > 0 there exists 5 > 0 such that ^m(A) < 5 implies X] m (^(^)) < e - 
There is an open U C [a, b] of measure < 5 that contains Z. Every U is a countable 
disjoint union of open intervals A. Their total length is mU < 5. Thus GZ C JJ G(Ii) 
and by (d) we have m(GZ) < ^m(G(A)) < e so m{GZ) — 0. 

Assume E C [a, b] is measurable and G is absolutely continuous according to (d) 
with e, 5 as above. Regularity of Lebesgue measure implies there are compact subsets 
K n C E such that K n | F C £, where Z — E \ F is a zero set. (F is an F a - set.) 
Continuity implies G(K n ) is compact. Since G(K n ) t GF , GF is measurable. Since 
GZ is a zero set, GE — GF U GZ is measurable. If mE < S then there is an open 
U — U I{ D E with mU — < 5. Then GE C JJ G(A) and by (d) we have 

m(GE) < X] m (^(^)) < e as desired. □ 


55 Theorem Let f : [a, b] 

rx 

integral F(x) = / f(t) dt. 


R be Lebesgue integrable and let F be its indefinite 


a 


(a) For almost every x the derivative F\x ) exists and equals f(x). 

(b) F is absolutely continuous. 

(c) If G is an absolutely continuous function and G'(x) — f{x) for almost every x 
then G differs from F by a constant. 


As we show in the next section (Corollary 62), the tacit assumption in (c) that 
G\x) exists is redundant. Theorem 55 then gives the following characterization of 
indefinite integrals. It is also called Lebesgue’s Main Theorem. 
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56 Lebesgue’s Antiderivative Theorem Every indefinite integral is absolutely con- 
tinuous and conversely , every absolutely continuous function has a derivative almost 
everywhere and up to a constant it is the indefinite integral of its derivative. 


Proof of Theorem 55 (a) This is Corollary 53. 

(b) Without much loss of generality we assume / > 0. We first suppose that / is 
bounded, say 0 < f(x) < M for all x. For each e > 0 the choice of 5 = e/M gives 

< E Mm(Ii) < e 

whenever f are disjoint subintervals of [a, b] having total length < 5. Proposition 54 
implies that F is absolutely continuous. 

Now assume / is unbounded and e > 0 is given. Choose M so large that 

m({(x,y) G Uf : fx > M}) < e/2. 


Define the functions 

f fx if fx > M 
g{x) = \ 

[0 otherwise 

and f m — f — g. The integral of g is < e/2 since it is the measure of Uf outside 
the rectangle [a, b] x [0, M]. Let Fm and G be the indefinite integrals of /m and g. 
Clearly / = /m + g implies F — Fm + G. See Figure 153. 



Figure 153 / g = m(Ug) and Jf = Jg + Jf M = m(Ug) + m(U(f M ))- 


Since /m is bounded there exists S > 0 such that 

y^m(e) < S => m(F M (Ij)) < e/2 

where the fi are disjoint intervals in [a, b\. Then ^m(C) < 5 implies 


m(F(Ii)) = Yi [ (fM + g) = N / f M + Yi [ 9 

J Ii J Ii J 1 1 

< e/2 + / g < e, 

J a 
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which completes the proof that F is absolutely continuous. 

(c) The Lebesgue proof resembles the Riemann proof in Chapter 3 - the Vitali 
Covering Lemma replaces the Lebesgue Number Lemma. We assume G is absolutely 
continuous and G'(x ) = /(x) almost everywhere. When F is the indefinite integral 
of / we want to show that H — F — G is constant. 


It is easy to see that sums and differences of absolutely continuous functions are 
absolutely continuous, so H is absolutely continuous and H\x) — 0 almost every- 
where. Fix any x* G [a, b] and define 


X = {x G 


a, x 




: H\x) exists and H\x) — 0}. 


By assumption rnX — x* — a. 

It is enough to show that for each e > 0 we have 


if(x*) -H(a) | < e. 


Absolute continuity implies there is a 5 > 0 such that if Ij 
disjoint intervals then 


0/^5 hi 


C 


a, b } 


are 


J2bi-a><5 

i 


J2 I H{bi) - H{oi) 


< e/2. 


Fix such a 5. Each x G X is contained in arbitrarily small intervals [x, x + h] C 
such that 


a, x 


1 


H{x + h)- H(x) 
h 


< 


2{b — a) 


These intervals form a Vitali covering V of X and the Vitali Covering Lemma implies 
that countably many of them, say Vj — [xj, Xj + hj], disjointly cover X up to a zero 
set. Thus their total length is ^ hj — x* — a and it follows that there is an N such 
that 


N 


hj > x* — a — 5. 

3 = 1 


Since \H(x + h) — H(x) \ < he/2(b — a) on each V-interval we have 


N 

V \ H ( x i + h i) 

3 = 1 


H(xj) 


< 


e 

2 (b — a) 


N 

r, ^ ~ 

3 = 1 


e (x* — a) 
2 (b — a) 


< e/2. 
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The TV + 1 intervals Ij — [a>j,bj\ complementary to the (interiors of the) intervals 
Vi, ... , V/v have total length <5 so ZjL o mbj)-H(aj)\ < e/2 by absolute conti- 
nuity. Thus 

TV TV 

H(x*) - H(a ) = H( X j + hj) - H( Xj ) + £ H(bj) - H(aj ) 

.7 = 1 . 7=0 

TV TV 

< - H(xj) \ + ^ 1^(6,) - H(aj ) | 

.7 = 1 . 7=0 

< 6 

which completes the proof that G differs from F by a constant. □ 

See Figure 154. 

ij -0+ 1 

«j h j a j + 1 h j - 1 

• • # • # ♦ • • 

n x j hj x b 

Vj V J+1 

Figure 154 The complementary intervals Vj and Ij 


10 Lebesgue’s Last Theorem 

The final theorem in Lebesgue’s groundbreaking book, Legons sur V integration, is 
extremely concise and quite surprising. 

57 Theorem A monotone function has a derivative almost everywhere. 

Note that no hypothesis is made about continuity of the monotone function. 
Considering the fact that a monotone function [a, b] — > R has only a countable number 
of discontinuities, all of jump type, this may seem reasonable, but remember - the 
discontinuities may be dense in [a, b\. If the monotone function happens to be an 
indefinite integral then differentiability was proved in Theorem 55. 

We assume henceforth that / is nondecreasing since the nonincreasing case can 
be handled by looking at — /. 
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Lebesgue’s proof of Theorem 57 used the full power of the machinery he had 
developed for his new integration theory. In contrast, the proof given below is more 
direct and geometric. It relies on the Vitali Covering Lemma and the following form 
of Chebyshev’s inequality from probability theory. 


The slope of / over [a, b] is 


s — 


f(P) ~ /(a) 

b — a 


58 Chebyshev Lemma Assume that f : [a, b] — > M is nondecreasing and has slope 
s over I — [a, b\. If I contains countably many disjoint subintervals I k and the slope 
of f over Ik is > S > s then 

E iai < I \n ■ 


k 


Proof Write Ik — [a^, bk\- Since / is nondecreasing we have 


fib)-f(a) > y f(pk) f {p'k) ^ ^ ^ S(bk a k ) 


k 


k 


Thus s 1 1 — $ an d the lemma follows. 


□ 


Remark An extreme case of this situation occurs when the slope is concentrated in 
the three subintervals drawn in Figure 155. 


Proof of Lebesgue’s Last Theorem Not only will we show that f'(x) exists al- 
most everywhere, but we will also show that f'(x) is a measurable function of x 
and 


(8) f f'(x) dx < f(b) — f (a). 

J a 


To estimate differentiability one introduces upper and lower limits of slopes called 
derivates. If h > 0 then [x, x + h\ is a “right interval” at x and (f(x + h) — f(x))/h 
is a “right slope” at x. The limsup of the right slopes as h — > 0 is called the 
right maximum derivate of / at x. It is denoted as D nght max /(x). The liminf 
of the right slopes is the right minimum derivate of / at x and is denoted as 
^nght Similar definitions apply to the left of x. Think of D llght max /(x) as 

the steepest slope at the right of x and D Tlght mm /(x) as the gentlest. See Figure 156. 
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Figure 155 Chebyshev’s Inequality for slopes 



Figure 156 Left and right slopes 
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There are four derivates. They exist at all points of [a, b] but they can take the 
value oo. We first show that two are equal almost everywhere, say the left min and 
the right max. Fix any s < S and consider the set 



{x E [a, b] : D leitmin f(x) < s < S < £> right max /(x)}. 


We claim that 



m*E — 0. 


At each x G E there are arbitrarily small left intervals [x — h, x] over which the 
slope is < s. These left intervals form a Vitali covering £ of E. (Note that the point x 
is not the center of its £-interval, but rather it is an endpoint. Also, we do not know 
a priori that E is measurable. Luckily, Vitali permits this.) Let e > 0 be given. By 
the Vitali Covering Lemma there are countably many disjoint left intervals Li G £ 
that cover E, modulo a zero set, and they do so e-efficiently. That is, if we write 


L — LI int Li 

i 


then E \ L is a zero set and mL < m*E + e. 


Every y G Ln E has arbitrarily small right intervals [y, y + 1\ C L over which the 
slope is > S. (Here it is useful that L is open.) These right intervals form a Vitali 
covering fR of L n E, and by the Vitali Covering Lemma we can find a countable 
number of disjoint intervals Rj G fR that cover L n E modulo a zero set. Since 
L H E — E modulo a zero set, R — |J Rj also covers E modulo a zero set. By the 
Chebyshev Lemma we have 


m*E < mR = V IGl < V L L 


i RnCLn 


< — ( m*E + e). 


Since the inequality holds for all e > 0, it holds also with e = 0 which implies that 
m*E = 0 and completes the proof of (9). Then 


{' X : D Mt min f(x) < £> right max /0)} = U E sS 

{ (s,S)eQxQ:s<S} 

is a zero set. Symmetrically, {x : E left mm /(x) > E nght max /(x)} is a zero set, and 
therefore E left mm f(x ) = E nght max /(x) almost everywhere. Mutual equality of the 
other derivates, almost everywhere, is checked in the same way. See Exercise 64. 

So far we have shown that for almost every x G [a, b } the derivative of / at x 
exists although it may equal oo. Infinite slope is not really acceptable and that is 
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the purpose of (8) - for an integrable function takes on a finite value at almost every 
point. 

The proof of (8) uses a cute trick reminiscent of the traveling secant method from 
Chapter 3. First extend / from [a, b] to R by setting f{x) — /(a) for x < a and 
f(x) = f(b) for x > b. Then define g n (x) to be the slope of the secant from (x, /(*)) 
to (x + 1/n, f(x + 1/n)). That is, 

f(x + l/n) - f(x) 

g n {x) = — = n(f(x + 1/n) — f(x)). 

1/n 

See Figure 157. Since / is almost everywhere continuous it is measurable and so is 



Figure 157 g n (x) is the slope of the right secant at x. 

g n . For almost every x, g n {x) converges to f'(x) as n — >> oo. Hence f is measurable 
and clearly f > 0. Fatou’s Lemma gives 




a 


rb rb 

f\x)dx — / liminf g n {x) dx < liminf / g n (x) dx 

I n^oc oo I 

CL CL 


The integral of g n is 


rb rb+l/n ra+l/n 

/ g n (x) dx — n f(x)dx — n f(x)dx. 

J a Jb J a 


o+l/n 
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The first integral equals f(b) since we set /(x) — f(b ) for x > b. The second integral 
is at least /(a) since / is nondecreasing. Thus 

b 

g n {x)dx < f(b) — f(a), 

which completes the proof of (8). As remarked before, since the integral of f is finite, 
f[x) < oo for almost all x, and hence / is differentiable (with finite derivative) almost 
everywhere. □ 



59 Corollary A Lipschitz function is almost everywhere differentiable. 


Proof Suppose that / : [a, b] — > R is Lipschitz with Lipschitz constant L. Then for 
all x, y G [a, b] we have 

I/O) - f(x ) | < L\y-x 


The function g{x) — /(x) + Lx is nondecreasing. Thus g' exists almost everywhere 
and so does f — g r — L. □ 


Remark Corollary 59 remains true for a Lipschitz function / : R n — > R, it is 
Rademacher’s Theorem, and the proof is much harder. 


Definition The variation of a function / : [a, b] — )> R over a partition X : a — 
x 0 < ■ ■ ■ < x n = b is the sum Y2=i \&kf\, where A k f = f(x k ) - f{x k - 1 ). The 
supremum of the variations over all partitions X is the total variation of /. If the 
total variation of / is finite then / is said to be a function of bounded variation. 


60 Theorem A function of bounded variation is almost everywhere differentiable. 


Proof Up to an additive constant, a function of bounded variation can be written 
as the difference /(x) = P(x) — 7V(x), where 

P(x) = snp{y^ A kf \ a — x q <...< x n — x and A &/ > 0} 

k 

i N{x) — — inf{^^ A kf : a — xq <...< x n — x and A kf < 0}. 

k 

See Exercise 67. The functions P and N are monotone nondecreasing, so for almost 
every x we have f'(x) — P'(x ) — N'(x) exists and is finite. □ 

61 Theorem An absolutely continuous function is of bounded variation. 
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Proof Assume that F : [a, b] — > R is absolutely continuous and take e = 1. There is 
a 5 > 0 such that if (a^, bf) are disjoint intervals in [a, b] with total length < S then 


< <5 


J2\F(bi) - F( ai )\ < 1. 


Fix a partition X of [a, 6] with M subintervals of length < 5. For any partition Y : a — 
yo < . . . < y n = b of [a, b\ we claim that 22k | A &/ | < M, where A k f = f(y k )-f(y k - 1 ). 
We may assume that T contains A since adding points to a partition increases the 
sum ^2 |Afc/|. Then 


Ei A ‘ F 


y 


|A/eT| + . . . + yy lA/eT 

A W 


where Y^- refers to the subintervals of Y that he in the j th subinterval of X. The 
subintervals in Yj have total length < S, so the variation of F over them is < 1 and 
the total variation of F is < M. □ 


62 Corollary An absolutely continuous function is almost everywhere differentiable. 

Proof Absolute continuity implies bounded variation implies almost everywhere dif- 
ferentiability. □ 


As mentioned in Section 9, Theorem 55 plus Corollary 62 express Lebesgue’s 
Main Theorem, 


Indefinite integrals are absolutely continuous and 
every absolutely continuous function has a derivative 
almost everywhere of which it is the indefinite integral 
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Appendix A Lebesgue integrals as limits 

The Riemann integral is the limit of Riemann sums. There are analogous “Lebesgue 
sums” of which the Lebesgue integral is the limit. 

Let / : R -A [0, oo) be given, take a partition Y : 0 = yo < yi < y 2 < . . . on the 
y- axis, and set 

Xi = {x E R : Hi- 1 < f(x) < Hi}. 

(We require that yi -A oo as i -A oo.) If / is measurable we define the lower Lebesgue 
sum as 

(X) 

L(f,Y) = 

i= 1 

L represents the measure of “Lebesgue rectangles” X, x [0. i/,_i ) in the undergraph. 
If/ is measurable^ then L ^ f f as the T-mesh tends to 0. It is natural to define the 
upper Lebesgue sum as • m(Xi) and to expect that it converges down to J f as 

the T-mesh tends to 0. If m({x : fx > 0}) < oo then this is true. However, if f(x) 

2 

is a function like e~ x then there’s a problem. The first term in the upper Lebesgue 
sum is always oo even though the integral is finite. The simplest solution is to split 
the domain into cubes Q, work on each separately, and add the results. Then 

L(fq,Y) < [ f Q < U(f Q , Y), 

Jq 

where L(f Q ,Y ) = ■m{X i C\Q), U(f Q ,Y ) = Vi ' m{Xi n Q), and f Q is 

the restriction of / to Q. As the T-mesh tends to 0 the lower and upper Lebesgue 
sums converge to the integral, just as in the Riemann case. 

Upshot Lebesgue sums are like Riemann sums and Lebesgue integration is like Rie- 
mann integration, except that Lebesgue partitions the value axis and takes limits 
while Riemann does the same on the domain axis. 


Appendix B Nonmeasurable sets 

If t G R is fixed then t-translation is the mapping x i— > x -j- t. It is a homeomorphism 
R -A R. Think of the circle S 1 as R modulo Z. That is, you identify any x with 
x + n for n G Z. Equivalently, you take the unit interval [0, 1] and you identify 1 

^We are using the undergraph definition of measurability. Corollary 41 implies that the sets Xi 
are measurable so the lower Lebesgue sum makes sense. 
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with 0. Then t-translation becomes rotation by the angle 2nt, and is denoted as 
R t : S 1 -> 5 1 . If t is rational then this rotation is periodic, i.e., for some n > 1, the 
77 th iterate of P, R n — Ro • • • o P, is the identity map S 1 — > S 1 . In fact the smallest 
such n is the denominator when t — m/n is expressed in lowest terms. On the other 
hand, if t is irrational then R — Rt is nonperiodic; every orbit 0(x) = {R k (x) : k G Z} 
is denumerable and dense in S 1 . 

63 Theorem Let t be irrational and set R — Rt. If P C S 1 contains exactly one 
point of each R-orbit then P is nonmeasurable with respect to linear Lebesgue measure 
on S 1 . 

Proof The R- orbits are disjoint sets, there are uncountably many of them, and they 
divide the circle as S 1 = Unez R n (P)- Translation is a meseometry. It preserves 
outer measure, measurability, and measure. So does rotation. Can P be measurable? 
No, because if it is measurable with positive measure then we would get 

(X) 

m(S 1 ) — m{R n P) — oo, 

n=—oo 

a contradiction, while if mP — 0 then m(S 1 ) — X^oc m {R n P) — 0, which contradicts 
the fact that m[0, 1) = 1. □ 

But does P exist? The Axiom of Choice states that given any family of nonempty 
disjoint sets there exists a set that contains exactly one element from each set. So 
if you accept the Axiom of Choice then you apply it to the family of i?-orbits and 
you get an example of a nonmeasurable set P, while if you don’t accept the Axiom 
of Choice then you’re out of luck. 

To increase the pathology of P we next discuss translations in more depth. 

64 Steinhaus’ Theorem If E C R is measurable and has positive measure then 
there exists a 5 > 0 such that for all t G (—5,5), the t-translate of E meets E. 


See also Exercise 57. 


65 Lemma If F C (a, b) is measurable and disjoint from its t-translate then 

2rnF < (6 — a) + \t\. 

Proof P and its t-translate have equal measure, so if they do not intersect then their 
total measure is 2 mP, and any interval that contains them must have length > 2 mP. 
If t > 0 then (a, b + t) contains P and its t-translate, while if t < 0 then (a + t, b) 
contains them. The length of the interval in either case is (6 — a) + |t|. □ 
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Proof of Steinhaus’ Theorem By the Lebesgue Density Theorem (Theorem 47) 
E has lots of density points so we can find an interval (a, b ) in which E has con- 
centration > 1/2. Call F — E H (a, b). Then mF > (b — a)/ 2. By Lemma 65 if 
\t\ < 2 mF — (b — a) then the t-translate of F meets F, so the t-translate of E meets 
F, which is what the theorem asserts. □ 


Now we return to the nonmeasurable set P discussed in Theorem 63. It contains 
exactly one point from each F-orbit, R being rotation by an irrational t. Set 

A = U P 2k P B = U R 2k+1 P. 

kez kez 

The sets A , B are disjoint, their union is the circle, and R interchanges them. Since 
R preserves outer measure we have m*A — m*B. 

The composite R? — R o R is rotation by 2 1, also an irrational number. Let e > 0 
be given. Since the orbit of 0 under R 2 is dense there is a large integer k with 

R 2k { 0) - (-t)| < e. 

For R 2k is the k th iterate of R 2 . Thus |i? 2fc+1 (0)| < e so R 2k+1 is a rotation by < e. 
Odd powers of R interchange A and F>, so odd powers of R translate A and B off 
themselves. It follows from Steinhaus’ Theorem that A and B contain no subsets of 
positive measure. Their inner measures are zero. 

The general formula mC — m^A + m*B in Lemma 20 implies that m*B — 1. 
Thus we get an extreme type of nonmeasurability expressed in the next theorem. 

66 Theorem The circle , or equivalently [0, 1), splits into two nonmeasurable disjoint 
subsets that each has inner measure zero and outer measure one. 

67 Corollary Every measurable set E C M n of positive measure contains a dop - 
pelganger - a nonmeasurable subset N such that m*N = mE, m^N — 0 ; and N 
u spreads itself evenly ” throughout E in the sense that if E' C E is measurable then 
m*(N HE') = m(E'). 


The proof is left to you as Exercise 50. 
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Appendix C Borel versus Lebesgue 

A valid criticism of Lebesgue theory as described in this chapter is that it conflicts a 
bit with topology, and problems arise if you try to think of Lebesgue measure theory 
in category terms. For example, not all homeomorphisms are meseomorphisms and 
composition of Lebesgue measurable functions can fail to be Lebesgue measurable. 
See Exercise 79. 

To repair these defects Armand Borel proposed replacing the cr-algebra M of 
Lebesgue measurable sets with a smaller one, 23 C M, and restricting Lebesgue 
measure to it. 23 is simply the intersection of all cr- algebras that include the open 
sets. There is one such cr-algebra, namely M, so 23 exists and is contained in M. 
It includes all G^-sets (countable intersections of open sets), all G^-sets (countable 
unions of G^-sets), etc. Thus As C i$Sa C tisaS C • • • C 23, where As is the collection 
of all G^-sets, iSso- is the collection of all G^-sets, etc. Likewise 3^ C 3^ a s C • • • C 23 
for F(j- sets, F^-sets, etc. See Exercise 8. 

A set is Borel measurable if it belongs to 23, and a nonnegative function is 
Borel measurable if its undergraph is a Borel measurable set. Equivalently a function 
is Borel measurable if the preimage of a Borel set is always Borel. The measure of 
F E 23 is its Lebesgue measure and the integral of a Borel measurable function is its 
Lebesgue integral. All continuous functions are Borel measurable and the composition 
of Borel measurable functions is Borel measurable. That’s good. 

However, 23 has its own defects, the main one being that it is not complete. That 
is, not all subsets of a zero set are Borel measurable. (Recall that every subset of 
a zero set is Lebesgue measurable.) In the same vein, the limit of a sequence of 
Borel measurable functions that converge almost everywhere can fail to be Borel 
measurable. See Exercise 80. 

I chose not to use the Borel approach in this chapter because it adds an extra 
layer of complication to the basic Lebesgue theory. You could not state the Monotone 
Convergence Theorem as “if f n is (Borel) measurable and f n t / then f f n t f /•” 
No. You would also need to assume f is Borel measurable. 

But the real reason I chose JVC over 23 is that I like pathology. The fact that there 
are ugly zero sets - zero sets carried by homeomorphisms to nonmeasurable sets - is 
eye-opening. I want you to see them as part of the Lebesgue picture. 

Here are a couple of relevant remarks from mathoverflow in answer to the ques- 
tion “Why do probabilists take random variables to be Borel (and not Lebesgue) 
measurable?” 
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Yuval Peres: One reason is that probabilists often consider more than 
one measure on the same space, and then a negligible set for one measure 
(added in a completion) might be not negligible for the other. The situa- 
tion becomes more acute when you consider uncountably many different 
measures (such as the distributions of a Markov process with different 
starting points.) 


Terry Tao: This is also a reason why the Borel sigma algebra on the 
domain is often preferred in ergodic theory. (A closely related reason is 
because of the connection between ergodic theory and topological dynam- 
ics; a topological dynamical system has a canonical Borel sigma algebra 
but not a canonical Lebesgue sigma algebra.) On the other hand, a signif- 
icant portion of ergodic theory is also concerned with almost everywhere 
convergence (wrt some reference invariant measure, of course), and then 
it becomes useful for the domain sigma algebra to be complete... 


Appendix D The Banach- Tarski Paradox 

If the nonmeasurable examples in Appendix B do not disturb you enough, here is a 
much worse one. You can read about it in Stan Wagon’s book, The Banach-Tarski 
Paradox. Many other paradoxes are discussed there too. 


The solid unit ball in 3-space can be divided into five disjoint sets, A i, . . . , A 5 , 
and the Ai can be moved by rigid motions to new disjoint sets A[ whose union is two 
disjoint unit balls. The Axiom of Choice is fundamental in the construction, as is 
dimensionality greater than two. The sets Ai are nonmeasurable. 


Think of this from an alchemist’s point of view. A one inch gold ball can be cut 
into five disjoint pieces and the pieces rigidly re assembled to make two one inch gold 
balls. Repeating the process would make you very rich. 
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Appendix E Riemann integrals as undergraphs 

The geometric description of the Lebesgue integral as the measure of the undergraph 
has a counterpart for Riemann integrals. 

68 Theorem A function f : [a, b } -A [0, M] is Riemann integrable if and only if the 
topological boundary of its undergraph is a zero set , m(d(Uf)) — 0. 


Remark Recall from page 424 that the measure-theoretic boundary of a set E is 

d m (E) — {p : p is a density point of neither E nor E c } 


and measurability of E is equivalent to d m (E) being a zero set. A function / : [a, b] 

[0, M] is Lebesgue integrable if and only if Uf is measurable, i.e., if and only if d m (l if) 
is a zero set. Combined with Theorem 68 this gives a nice geometric parallel between 
Riemann and Lebesgue integrability: 


/ is Riemann integrable 4= 
/ is Lebesgue integrable 4= 


=> m{d(Uf)) = 0. 


» m(d m (Uf )) = 0. 


Remark Since d(l If) — Uf \ int(lX/), equivalent to m(d(Uf)) — 0 is m(int(lX/)) = 

69 Lemma If X is a metric space, f : X -a [0, oo), and 

f{x) — liminf f[t) f(x) — limsup f(t) 

— l ^ x t-+x 


then Uf = int(l X/) and Uf = Uf . 

Proof Take any (x,y) G Uf. Then y < f(x) and for all (t, s) near (x,y) we have 
s < fit). Thus ft, s) G Uf , (x,y) G int(tf/), and Uf C int(l if). The proof of the 
reverse inclusion is similar, so Uf — int(lX/). See Figure 158. 


The proof that Uf = Uf is slightly different. If (x, y) G Uf then y < f(x) so there 
exists t n -A x such that f(t n ) -A f(x). Choose y n < f(t n ) such that y n — > y. Thus 
(L, Vn) C U/, ( t n ,y n ) 


(x,y), (x,y) G Uf , and Uf C Uf. Conversely, if (x,y) G Uf 
then there exists (t n ,y n ) G IX/ such that (t n ,y n ) -A (x,y). Then < f(t n ) and 


limsup /(t n ) > lim = y. Thus, y < /(x), (x,y) G IX/, and IX/ C IX/, giving 


n— ^oo 


n— ^oo 


equality, 11/ = 11/ . See Figure 158. 


□ 
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Figure 158 The shaded region is contained in the interior of Uf. 


Proof of Theorem 68 Applying Lemma 69 to / : [a, b] — > [0 , M] gives 


t if — int(l if) and t if = t if. 


Since open sets and closed sets are measurable, this implies / and / are measurable 
functions. Thus 

m(d{Uf )) = m(Uf \ int(lX/) ) = m(Uf) - m(Uf) = f f - /. 

J [a,b\ 

The integral is zero if and only if / = / almost everywhere, i.e. , if and only if / is 
continuous almost everywhere, i.e., by the Riemann-Lebesgue Theorem (Theorem 23 
in Chapter 3) if and only if / is Riemann integrable. □ 


70 Corollary If f is Riemann integrable then it is Lebesgue integrable and the two 
integrals are equal. 


Proof Since 


interior Uf C Uf C closure It/, 


equality of the measures of its interior and closure implies that Uf is measurable, and 
it shares their common measure. Since the Lebesgue integral of / is equals m(Uf) 
the proof is complete. □ 


Remark The undergraph definition of integrals has a further expression in terms of 
Jordan content: The Riemann integral of a function / : [a, b] — > [0, M] is the Jordan 
content of its undergraph, J(1X/), provided that J(Uf) exists. See Exercises 11 - 14. 
In brief, Undergraphs lead to natural pictorial ways of dealing with integrals, both 
Riemann and Lebesgue. 
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Appendix F Littlewood’s Three Principles 

In the following excerpt from his book on complex analysis, Lectures on the Theory 
of Functions , J.E. Littlewood seeks to demystify Lebesgue theory. It owes some of 
its popularity to its prominence in Royden’s classic text, Real Analysis. 

The extent of knowledge [of real analysis] required is nothing like as great 
as is sometimes supposed. There are three principles, roughly expressible 
in the following terms: Every (measurable) set is nearly a finite sum of 
intervals; every function (of class L x ) is nearly continuous; and every 
convergent sequence of functions is nearly uniformly convergent. Most of 
the results of the present section are fairly intuitive applications of these 
ideas, and the student armed with them should be equal to most occasions 
when real variable theory is called for. If one of the principles would be 
the obvious means to settle a problem if it were “quite” true, it is natural 
to ask if the “nearly” is near enough, and for a problem that is actually 
soluble it generally isf 


Littlewood’s First Principle expresses the regularity of Lebesgue measure 
(Theorem 16). Given e > 0, a measurable E C [a, b] contains a compact subset 
covered by finitely many intervals whose union differs from E by a set of measure 
less than e. In that sense, E is nearly a finite union of intervals. I like very much 
Littlewood’s choice of the term “nearly,” meaning “except for an e-set,” to contrast 
with “almost,” meaning “except for a zero set.” 


Littlewood’s Second Principle refers to “functions of class L A ,” although 
he might better have said “measurable functions.” He means that if you have a 
measurable function and you are given e > 0 then you can discard an e-set from 
its domain of definition and the result is a continuous function. This is Lusin’s 
Theorem: a measurable function is nearly continuous. 


Proof of Lusin’s Theorem We assume that / : R -a R is measurable and e > 0 is 
given. We use the fact that R has a countable base y = {Yi, Y 2 , . . . } for its topology. 
(This means every open subset of R can be expressed as a union of some of the 
members of y. For instance, we could take y to be the collection of all open intervals 
with rational endpoints.) 

Using the preimage definition of measurability we know that f pTe (Yk) is measur- 
able so there exists a sandwich K f C f pre (Yk) C Ur where Ky is closed, Ur is 

^Reprinted from Lectures on the Theory of Functions by J.E. Littlewood (1994) by permission of 
Oxford University Press. 
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open, and m[XJ^\Kk) < e/2 k . Thus S = \J(U k \ Kb) is an open set with mS < e. We 
claim that g — f \k is continuous, where K is the closed set R n \ S. By De Morgan’s 
Law we have 

oo 

K = s c = fl (K k UU c k ) 

k= 1 

and therefore 


g we (Y k ) = r e (Y k ) n k c u k nK 

OO 

= u k n fl (Kj u y c ) c u k n (K k u u c k ) 

3 = 1 

= U k CiK k = I( k c g pre (Y k ). 


Hence g pre (Y i &) — Ur H K is open in K. 

Now if V is an arbitrary open subset of R then it is the union of some members 
of y, say V = UreL(v) Yi, where L(V) C N. Then g^(V) = \J £eL{v) g^ e (Y £ ) is open 
in K which gives continuity of g. □ 


Littlewood’s Third Principle concerns a sequence of measurable functions 
f n : [a, b] — > R that converges almost everywhere to a limit. Except for an e-set the 
convergence is actually uniform, which is Egoroff’s Theorem: Almost everywhere 
convergence implies nearly uniform convergence. 


Proof of Egoroff’s Theorem Set 


X(fc, £) = {x G [a, 6] : Vn > k we have |/ n (x) — /(x)| < 1 / T} . 


Fix £ G N. Since f n (x) — > /(x) for almost every x we have U/c X(k,£) U Z(£) 
where Z(T) is a zero set. 



Let e > 0 be given. By measure continuity m(X (fc, £)) — > b — a as k — > oo. 
This implies we can choose /q < &2 < . . . such that for X^ = X(k^£) we have 
m{Xf) < e/2^. Thus m[X c ) < e where X = f^X^. 

We claim that f n converges uniformly on X. Given a > 0 we choose and Lx f 
such that !/&<&. For all n > kn we have 


xeX xGXf = X(k £ , £) => |/n(^)-/(^) < 


Hence f n converges uniformly to / off the e-set X c . (We used 
with two different meanings.) 


1/f < <T. 

cr to avoid writing e 

□ 


See also Exercise 83. 
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Appendix G Roundness 

The density of a set E at p is the limit, if it exists, of the concentration of E in a ball 
or cube that shrinks down to p. What if you used another shape such an ellipsoid or 
solid toms? Would it matter? The answer is “somewhat.” 

Let us say that a neighborhood U of x is iT-quasi-round if it can be sandwiched 
between balls B cU C B' with diamiT < KdmmB. A ball is 1-quasi-round while 
a square is \/2-quasi-round. 

It is not hard to check that if x is a density point with respect to balls then it also 
a density point with respect to K -quasi-round neighborhoods of x, provided that 
K is fixed as the neighborhoods shrink to x. See Exercises 60 and 61. When the 
neighborhoods are not quasi-round, the density point analysis becomes marvelously 
complicated. See Falconer’s book, The Geometry of Fractal Sets. 


Appendix H Money 

Riemann and Lebesgue walk into a room and find a table covered with hundreds of 
U.S. coins. (Well, . . . ) How much money is there? 

Riemann solves the problem by taking the coins one at a time and adding their 
values as he goes. As he picks up a penny, a nickel, a quarter, a dime, a penny, etc., 
he counts: “1 cent, 6 cents, 31 cents, 41 cents, 42 cents, etc.” The final number is 
Riemann ’s answer. 

In contrast, Lebesgue first sorts the coins into piles of the same value (partitioning 
the value axis and taking preimages); he then counts each pile (applying counting 
measure); and he sums the six terms, “value v times number of coins with value vf 
and that is his answer. 

Lebesgue’s answer and Riemann’s answer are of course the same number. It is 
their methods of calculating that number which differ. 

Now imagine that you walk into the room and behold this coin-laden table. Which 
method would you actually use to find out how much money there is - Riemann’s or 
Lebesgue’s? This amounts to the question: Which is the “better” integration theory? 
As an added twist suppose you have only sixty seconds to make a good guess. What 
would you do then? 
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Exercises 

1. (a) Show that the definition of linear outer measure is unaffected if we demand 

that the intervals Ik in the coverings be closed instead of open. 

(b) Why does this immediately imply that the middle-thirds Cantor set has 
linear outer measure zero? 

(c) Show that the definition of linear outer measure is unaffected if we drop 
all openness/closedness requirements on the intervals Ik in the coverings. 

(d) What about planar outer measure? Specifically, what if we demand that 
the rectangles be squares? 

2. The volume of an n-dimensional box is the product of the lengths of its edges 
and the outer measure of A C R n is the inhmum of the total volume of countable 
coverings of A by open boxes. 

(a) Write out the proof of the outer measure axioms for subsets of R n . 

(b) Write out the proof that the outer measure of a box equals its volume. 

3. A line in the plane that is parallel to one of the coordinate axes is a planar zero 
set because it is the Cartesian product of a point (it’s a linear zero set) and R. 

(a) What about a line that is not parallel to a coordinate axis? 

(b) What is the situation in higher dimensions? 

4. The proof of Lemma 11 was done in the plane. The key insight was that a 
square S contains a disc A such that mA/mS > 1/2. Find a corresponding 
inequality in n-space and write out the n-dimensional proofs of the lemma and 
Theorem 9 carefully. 

5. Prove that every closed set in R or R n is a G^-set. Does it follow at once that 
every open set is an i/j-set? Why? 

6. Complete the proofs of Theorems 16 and 21 in the unbounded, n-dimensional 
case. [Hint: How can you break an unbounded set into countably many disjoint 
bounded pieces?] 

7. Show that inner measure is translation invariant. How does it behave under 
dilation? Under affine motions? 

*8. Prove that R \ Q is an F^-set but not an F a - set. [Hint: Baire.] Infer that 
z/z ^ s- You can google “Descriptive Set Theory” for further inequalities like 
this. 

9. Theorem 16 implies that if E is measurable then its inner and outer measures 
are equal. Is the converse true? [Proof or counterexample.] 

10. For an arbitrary set M define uj : 2 M — > [0, oo] as oj(S) = #(£>), where 2 M 
is the power set of M (the collection of all subsets of M) and #(S) is the 
cardinality of S. Prove that uj is an abstract outer measure and all sets S C M 
are measurable. [This is counting measure. It makes frequent appearances 
in counterexamples in abstract measure theory.] 

11. The outer Jordan content of a bounded set A C R is the inhmum of the 
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total lengths of finite coverings of A by open intervals, 




n 


£i J * 

k = 1 


: each I k is an open interval and A C 



The corresponding definitions of outer Jordan content in the plane and n-space 
substitute rectangles and boxes for intervals. 

(a) Show that outer Jordan content satisfies 

(i) J*(0) = 0. 

(ii) If Ac B then J*A < J*H. 

n 

(iii) If A = Ufc=i Ak then J*A < ^ J*Ak- 

k = 1 

(b) (iii) is called hnite subadditivity. Find an example of a set A C [0, 1] such 
that A = JJfcLi Ak, J*Ak — 0 for all fc, and J*A — 1, which shows that 
hnite subadditivity does not imply countable subadditivity and that J* is 
not an outer measure. 

(c) Why is it clear that rrAA < J*A, and that if A is compact then mA = J*A? 
What about the converse? 

(d) Show that the requirement that the intervals in the covering of A be open 
is irrelevant. 

12. Prove that 

J*A = J*A = mA 

where A is the closure of A. 

13. If A, B are compact prove that 

J*(AUB) + J*(4nB) = J*A + J*5. 

[Hint: Is the formula true for Lebesgue measure? Use Exercise 12. 

14. The inner Jordan content of a subset A of an interval I is 


JxcA 


I\ - J*(I\A). 


(a) Show that 

J*A — m(interior A). 

(b) A bounded set A with equal inner and outer Jordan content is said to have 
content or to be Jordan measurable, and we write J^A — J A — J*A, 
even though J is not a measure. (Is this any worse than functions with 
infinite integrals being nonintegrable?) 

(c) Infer from Theorem 68 and the Riemann-Lebesgue Theorem that / : 
a, b] — > [0, M] is Riemann integrable if and only if its undergraph is Jordan 
measurable, and in that case its Riemann integral equals J(Uf). 


452 


Lebesgue Theory 


Chapter 6 


*15. Construct a Jordan curve (homeomorphic copy of the circle) in R 2 that has 
positive planar measure. [Hint: Given a Cantor set in the plane, is there a 
Jordan curve that contains it? Is there a Cantor set in the plane with positive 
planar measure? (Take another look at Section 9 in Chapter 2.)] 

16. Write out the proofs of Lemmas 23, 24, and 25 in the n-dimensional case. 

17. Write out the proofs of the Measurable Product Theorem (Theorem 21) and 
the Zero Slice Theorem (Theorem 26) in the unbounded, n-dimensional case. 

**18. Suppose that E is measurable. 

(a) If E C R and e > 0 is given, prove there exists a fat Cantor set F C E 
such that mE < m(F) + e. [Hint: Review Exercise 2.151. 

(b) Do the same in R n . 

(c) Do the same in R and R n if E is nonmeasurable but m^E > 0. [Hint: 

K e ] 

**19. Consider linear Lebesgue measure mi on the interval I and planar Lebesgue 
measure m 2 on the square 1 2 . Construct a meseometry I — > I 2 . Thus meseome- 
try disrespects topology: (/, M (/), mi) is meseometric to (/ 2 , M(/ 2 ), m 2 ). [Hint: 
You might use the following outline. The inclusion I \ Q — >> I is injective and 
preserves m\. You can convert it to a bijection a : I \ Q — > I by choosing a 
countable set L C / \ Q and then choosing any bijection ao : L — > L U (Q n I). 
Then you can set a(x) — olq(x) when x G L and a(x) = x otherwise. Why 
is a is a meseometry? (Already this shows that nonhomeomorphic spaces can 
have meseometric measure spaces.) In the same way there is a meseometry 
/? : I 2 \ Q 2 — >> I 2 . Then let A — I \ Q. Express x G A as a base- 2 expansion 

X — (a 10 - 2 ^ 3 a 4 0 - 5 CIq . . . ) 


20 . 

* 21 . 


22. 


using the digits 0 and 1. It is unique since x is irrational. Then consider the 
corresponding base-4 expansion 

(t(x) ( (tti Qj2 ) ($ 3 U 4 ) (u5tt6 ) • • • ) 

using the digits (00), (01), (10), and (11). Prove that a (A) = I 2 \ Q 2 and a 
preserves measure. Conclude that T — f3 o a o a~ l is a meseometry I — > I 2 . 
Generalize Exercise 19 with R in place of I and then with R n in place of R. 
Suppose that J7, V C R n are open. If a homeomorphism T : U — > V and 
its inverse send Lebesgue zero sets to Lebesgue zero sets prove that it is a 
Lebesgue meseomorphism ([/, M(?7), m\jj) -G (E, M(E), m|y). [Note that the 
homeomorphism T : R — > R which sends the fat Cantor set to the standard 
Cantor set sends zero sets to zero sets but T -1 does not.] 

If [/, V C R n are open and T : U — > V is a Lipeomorphism (i.e. , a Lipschitz 
homeomorphism with Lipschitz inverse) use Exercise 21 to show that T is a 
meseomorphism with respect to Lebesgue measure. 
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23. Use Exercise 22 and the n-dimensional Mean Value Theorem to prove that a 
diffeomorphism T : U — ^ V is a meseomorphism. [Pay attention to the fact 
that U and V are noncompact.] 

24. (a) If T : R — >> R is a continuous meseometry prove that T is rigid. 

(b) What if T is discontinuous? 

(c) Find a continuous nonrigid meseometry T : R 2 — > R 2 . [Hint: Divergence.] 

25. Let / : R — > [0, oo) be given. 

(a) If / is measurable why is the graph of / a zero set? 

(b) If the graph of / is a zero set does it follow that / is measurable? 

**(c) Read about transfinite induction and go to stackexchange to see that there 
exists a nonmeasurable function / : [a, b\ — > [0, oo) whose graph is non- 
measurable. 

(d) Infer that the measurability hypothesis in the Zero Slice Theorem (Theo- 
rem 26) is necessary since every vertical slice graph of the function in (c) 
is a zero set (it is just a single point) and yet the graph has positive outer 
measure. 

(e) Why can a graph never have positive inner measure? 

(f) How does (c) yield an example of uncount ably many disjoint subsets of 
the plane, each with infinite outer measure? 

(g) What assertion can you make from (f) and Exercise 19? 

26. Theorem 35 states that Tf is a meseometry when / : R — > [0, oo) is integrable. 
Prove the same thing when / : R — > [0, oo) is measurable. What about a 
measurable function R n — > R? [Hint: Express / as JU k fi where the support 
of fi : k is [i — 1, z) D f pre ([k — 1, k)). Why is fi ^ integrable and how does this 
imply that Tf is a meseometry?] 

27. Using the undergraph definition, check linearity of the integral directly for the 
two measurable characteristic functions, f — Xf and g — Xg- 

28. The total undergraph of / : R — >> R is ILf = {{x,y) : y < f(x)}. 

(a) Using undergraph pictures, show that the total undergraph is measurable 
if and only if the positive and negative parts of / are measurable. 

(b) Suppose that / : R — > (0, oo) is measurable. Prove that 1// is measurable. 
[Hint: The diffeomorphism T : (x, y) i— > (x, 1/y) sends 1 if to U(1 //).] 

(c) Suppose that /, g : R — >> (0, oo) are measurable. Prove that / • g is mea- 
surable. [Hint: T : (x,y) i— > (x,logy) sends 1 if and Ug to U (log/) and 
ll(logg). How does this imply log fg is measurable, and how does use of 
T -1 : (x,y) i— > (x,e y ) complete the proof?] 

(d) Remove the hypotheses in (a)-(c) that the domain of f,g is R. 

(e) Generalize (c) to the case that /, g have both signs. 


454 


Lebesgue Theory 


Chapter 6 



30. 


31. 

32. 

33. 


34. 

35. 

36. 


37. 


38. 

39. 


A function / : M — > M is upper semicontinuous if 


lim Xk = x => limsup f(xk) < f(x). 

/c— >°o — >-0O 

( M can be any metric space.) Equivalently, lim sup fy < fx. 

y—>x 

(a) Draw a graph of an upper semicontinuous function that is not continuous. 

(b) Show that upper semicontinuity is equivalent to the requirement that for 
every open ray (— 00 , a), the preimage / pre (— 00 , a) is an open set. 

(c) Lower semicontinuity is defined similarly. Work backward from the fact 
that the negative of a lower semicontinuous function is upper semicontin- 
uous to give the definition in terms of lim infs. 

Given a compact set K C R x [0, 00 ) define 



ma x{y : (x, y) E K if K D (X X M) / 0 
0 otherwise. 


Prove that g is upper semicontinuous. 

Prove that a measurable function / is sandwiched as u < f < v, where u is 
upper semicontinuous, v is lower semicontinuous, and v — u has small integral. 
[Hint: Exercise 30 and regularity.] 

Prove Proposition 38. 

Suppose that fk : [a, b } — > R n converges almost everywhere to / as k 00 . 

(a) Verify that the Dominated Convergence Theorem fails if there is no inte- 
grate dominating function g. 

(b) Verify that the inequality in Fatou’s Lemma can be strict. 

If fn : R — > [0, 00 ) is a sequence of integrable functions, f n -l f a - e - as n ^ 00 , 
and J fn i 0. Prove that f — 0 almost everywhere. 


Find a sequence of integrable functions /& : [a, b] — > [0, 1] such that fk 0 

as k — 00 but it is not true that fk(x) converges to 0 a.e. 

Show that the converse to the Dominated Convergence Theorem fails in the 

following sense: There exists a sequence of functions /& : [a, b] — > [0, 00 ) such 

•6 


that fk — >* 0 almost everywhere and f a fk 0 as k — > 00 , but there is no 
integrable dominator g. [Hint: Stare at the graph of f(x) — 1/x. 

Suppose that a sequence of integrable functions fk converges almost everywhere 
to / as k — > 00 and fk takes on both positive and negative values. If there exists 
an integrable function g such that for almost every x we have \fk(x)\ < g(x), 
prove that J fk ^ f f as k — > 00 . 

If / and g are integrable prove that their maximum and minimum are integrable. 
Suppose that / and g are measurable and their squares are integrable. Prove 
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that fg is measurable, integrable, and 



[Hint: Exercise 28 helps.] 

40. Find an example where Exercise 39 fails if “square integrable” is replaced with 
“integrable.” 

41. Suppose that fk is a sequence of integrable functions and ^2 J \ fk\ < oo. Prove 
that ^2 fk is integrable and 




*42. Prove that 




43. 

44. 

45. 


Prove that g(y) — J 0 °° e x sin( x + y) dx is differentiable. 

Write out the proof of the multidimensional Cavalieri’s Principle (Theorem 39). 
As in Corollary 41 we say that a function / : M — >> M is preimage measurable 
if for each a G R. the set / pre ([a, oo)) = {x G R : a < fix)} is Lebesgue 
measurable. This is the standard definition for measurability of a function. 
Prove that the following are equivalent conditions for preimage measurability 
of f : IR. — y M. 

(a) The preimage of every closed ray [a, oo) is measurable. 

(b) The preimage of every open ray (a, oo) is measurable. 

(c) The preimage of every closed ray (— oo, a] is measurable. 

(d) The preimage of every open ray (— oo , a) is measurable. 

(e) The preimage of every half-open interval [a, b ) is measurable. 

(f) The preimage of every open interval (a, b) is measurable. 

(g) The preimage of every half-open interval (a, b] is measurable. 

(h) The preimage of every closed interval [a, b] is measurable. 

(i) The preimage of every open set is measurable. 

(j) The preimage of every closed set is measurable. 

(k) The preimage of every G^-set is measurable. 

(l) The preimage of every F a - set is measurable. 
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*46. 


Here is a trick question: “Are there any functions for which the Riemann in- 
tegral converges but the Lebesgue integral diverges?” Corollary 70 would sug- 
gest the answer is “no.” Show, however, that the improper Riemann integral 

fo /( x ) d x 


f(x) = < 


/ 7 T 7 r 

— sin — if x ^ 0 

X ry* 


0 


if x = 0 


exists (and is finite) while the Lebesgue integral is infinite. [Hint: Integration 
by parts gives 


f 1 7 r 

. 7 r _ 

7T 

1 

f 

/ - 

sm — dx — 

x cos — 

— 

/ 

la X 

X 

X 

a 

J a 


l 


cos — dx. 
x 


Why does this converge to a limit as a 0 + ? To check divergence of the 
Lebesgue integral, consider intervals [l/(fc + 1), 1 /k\. On such an interval the 
sine of n/x is everywhere positive or everywhere negative. The cosine is +1 at 
one endpoint and —1 at the other. Now use the integration by parts formula 
again and the fact that the harmonic series diverges.] 

*47. A nonnegative linear combination of measurable characteristic functions is a 
simple function. That is, 


n 


4>{ x ) = N C i' X Ei(x) 


i — 1 


*48. 


where E\, . . . , E n are measurable sets and ci, . . . , c n are nonnegative constants. 
We say that ^ ciXe 1 “expresses” 0. If the sets E{ are disjoint and the coefficients 
ci are distinct and positive then the expression for <p is called canonical. 

(a) Show that a canonical expression for a simple function exists and is unique. 

(b) It is obvious that the integral of </> = X CiXEi (the measure of its un- 
dergraph) equals ^ Cim(Ei) if the expression is the canonical one. Prove 
carefully that this remains true for every expression of a simple function. 

(c) Infer from (b) that f cf) + ijj = f (j) + f ijj for simple functions. 

(d) Given measurable /, g : R — > [0, oo), show that there exist sequences of 
simple functions 4> n t / an d t g as n — ^ °o. 

(e) Combine (c) and (d) to revalidate linearity of the integral. 

In fact this is often how the Lebesgue integral is developed. A “preintegral” 
is constructed for simple functions, and the integral of a general nonnegative 
measurable function is defined to be the supremum of the preintegrals of lesser 
simple functions. 

The Devil’s ski slope. Recall from Chapter 3 that the Devil’s staircase function 
H : [0, 1] — > [0, 1] is continuous, nondecreasing, constant on each interval com- 
plementary to the standard Cantor set, and yet is surjective. For n G Z and 
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x G [0, 1] we define H(x + n) — H (x) + n. This extends H to a continuous 
surjection R R. Then we set 


fl’fc(x) = #(3 fc x) 


and 


j(x) = y; 


/c=0 


4 k 


Prove that J is continuous, strictly increasing, and yet J' — 0 almost every- 
where. [Hint: Fix a > 0 and let 

S a — { x '• J\x) exists, J\x ) > a, and 

x belongs to the constancy intervals of every H^}. 

Use the Vitali Covering Lemma to prove that m*(*S a ) = 0.] 

*49. Prove that / : R — > R is Lebesgue measurable if and only if the preimage of 
every Borel set is a Lebesgue measurable. What about / : R n — > R? 

*50. (a) Prove Corollary 67: Each measurable E C R with mE > 0 contains a 

nonmeasurable set N with m*N — mE , m*N — 0, and for each measurable 
E r C E we have m(E') — (iV is a “doppelgnger” of E.) [Hint: 

Try N — P n E when E C [0, 1) and P is the nonmeasurable set from 
Theorem 66.] 

(b) Is N uniquely determined (modulo a zero set) by El 

51. Generalize Theorem 66 and Exercise 50 to R n . [Hint: Think about P x P and 
its complement in I 2 . 

Remark There are even worse situations. R n is the disjoint union of #R sets 
like P. This fact involves “Bernstein sets” and transfmite induction. See also 
Exercise 25. 


52. Prove Corollary 50 from Theorem 49. 

53. Consider the function / : R 2 — >> R defined by 


f(x,y) 


l 


y 2 



X 


if 0<x<y<l 
if {) < y < x < 1 


0 otherwise. 


(a) Show that the iterated integrals exist and are finite (calculate them) but 
the double integral does not exist. 

(b) Explain why (a) does not contradict Corollary 43. 
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54. Do (A) or (B), but not both. 

(A) (a) State and prove Cavalieri’s Principle in dimension 4. 

(b) Formulate the Fubini-Tonelli theorem for triple integrals and use (a) 
to prove it. 

(B) (a) State Cavalieri’s Principle in dimension n + 1. 

(b) State the Fubini-Tonelli Theorem for multiple integrals and use (a) to 
prove it. 

How short can you make your answers? 

55. (a) What are the densities (upper, lower, balanced, and general) of the disc 

in the plane and at which points do they occur? 

(b) What about the densities of the square? 

***(c) What about the densities of the fat Cantor set? 

56. Suppose that P CK has the property that for every interval (a, b) C R we have 

m*(PPi(a,b)) 1 

b — a 2 

(a) Prove that P is nonmeasurable. [Hint: This is a one-liner.] 

(b) Is there anything special about 1/2? 

57. Formulate and prove Steinhaus’ Theorem (Theorem 64) in n-space. 

58. The balanced density of a measurable set E at x is the limit, if exists, of the 
concentration of E in B where B is a ball centered at x that shrinks down to 
x. Write 5baianced(^ 5 E) to indicate the balanced density, and if it is 1, refer to 
x as a balanced density point. 

(a) Why is it immediate from the Lebesgue Density Theorem that almost 
every point of E is a balanced density point? 

(b) Given a G [0,1], construct an example of a measurable set E C R that 
contains a point x with 5baianced(^ 5 E) — a. 

(c) Given a G [0,1], construct an example of a measurable set E C R that 
contains a point x with 5(x, E) — a. 

**(d) Is there a single set that contains points of both types of density for all 
ol G [0, 1]? 

59. Prove that the density points of a measurable set are the same as its balanced 
density points. [Hint: Exercise 62 is relevant. 

*60. Density is defined using cubes Q that shrink down to p. What if p need not 
belong to Q, but its distance to Q is on the order of the edgelength £ of Q? 
That is, d(p, Q ) < K£ for some constant K as £ — > 0. (Q is a satellite of p.) 
Do we get the same set of density points? 

*61. As indicated in Appendix F, U C R n is iF-quasi-round if it can be sandwiched 
between balls B C U C B' such that diamiT < K diam B. 

(a) Prove that in the plane, squares and equilateral triangles are (uniformly) 
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quasi-round. (The same K works for all of them.) 

(b) What about isosceles triangles? 

(c) What about annuli of inner radius r and outer radius R such that R/r < 
10, and what about balanced density for such annuli? [Hint: Draw a 
picture.] 

(d) Formulate a Vitali Covering Lemma for a Vitali covering V of A C R 2 by 
uniformly quasi-round sets instead of discs. 

(e) Prove it. 

(f) Generalize to R n . 

[Hint: Review the proof of the Vitali Covering Lemma.] 

*62. Consider a measure-theoretic definition of iL-quasi-roundness of a measurable 
W C R n as 

diam(VE) n 
w - K - 

mW 



64. 

65. 


(a) What is the relation between the two definitions of quasi-roundness? 

(b) Fix a point p G R n and let be the family of measurable sets containing 
p which are iL-quasi-round in the measure-theoretic sense. Prove that p 
is a density point of a measurable set E if and only if the concentration of 
E in W tends to 1 as W G shrinks to p. [Hint: Each W could be a 
fat Cantor set, but take heart from the realization that if 99% of a set is 
red then 10% of it is quite pink.] 

Let E C R n be measurable and let x be a point of dE , the topological boundary 
of E. (That is, x lies in both the closure of E and the closure of E c .) 

(a) Is it true that if the density S — 5(x,E) exists then 0 < S < 1? Proof or 
counterexample. 

(b) Is it true that if 5 — 5{x , E) exists and 0 < 5 < 1 then x lies in d El Proof 
or counterexample. 

(c) What about balanced density? 

Choose a pair of derivates other than the right max and left min. If / is 
monotone write out a proof that these derivates are equal almost everywhere. 
Exercise 3.34 asks you to prove that the set of critical values of a C 1 function 
/ : R R is a zero set. (A critical point of / is a point p such that f'(p ) = 0 
and a critical value of / is a q G R such that fp — q for some critical point p.) 
Give it another try. 

(a) What are the critical points and critical values of the function sin x7 

(b) If / : [a, b] — > R is C 1 why are the sets of critical points and critical values, 
cp (/) and cp (/), compact? 

(c) How can you cover cv(/) with finitely many intervals of small total length? 
[Hint: Mean Value Theorem as an inequality.] 

(d) How can you go from [a, b } to R? 
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66. Construct a monotone function / : [0, 1] — >> R whose discontinuity set is exactly 
the set Q n [0, 1], or prove that such a function does not exist. 

*67. In Section 10 the total variation of a function / : [a, b] — >> R is defined as the 
supremum of all sums Y17=i I A^/|, where P partitions [a, b] into subintervals 
xi- 1 , Xi\ and A = f{xi) — f(xi- 1 ). Assume that the total variation of / is 
finite (i.e., / is of bounded variation) and define 


TZ = sup s T |A*/| 


P a " = sup <1 Ai/ : AV>0 


k 


N a = ~ inf CA i/ : Aif < 0 


k 


a, x 


Prove that 


* 


68 . 


where P ranges through all partitions of 

(a) / is bounded. 

(b) T-, P-, are monotone nondecreasing functions of x. 

(c) T% = P* + N*. 

(d) / (*) - /(a) + P* - N*. 

Assume that / : [a, b] R has bounded variation. The Banach indicatrix is 
the function 

y^Ny = #/ pre (y). 

N y is the number of roots of / = y. The horizontal line [a, b] x y meets the 
graph of / in N y points. 

(a) Prove that N y < oo for almost every y. 

(b) Prove that y i— > N y is measurable. 

(c) Prove that 


rjib 

1 a 


'd 


Nydy 


* 


69. 


where c < min / and max / < d. 

(a) Assume that A n t A as n -0- oo but do not assume that A n is measurable. 
Prove that m*A n — > m*A as n — > oo. (This is upward measure continuity 
for outer measure. [Hint: Regularity gives G^-sets G n D A n with m(G n ) — 
m*(A n ). Can you make sure that G n increases as n — > oo? If so, what can 
you say about G — U G n ?]) 

(b) Is upward measure continuity true for inner measure? [Proof or counterex- 
ample.] 

(c) What about downward measure continuity of inner measure? Of outer 
measure? 
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*70. Let A C R n be arbitrary, measurable or nonmeasurable. 

(a) Prove that the hull and kernel of A are unique up to zero sets. 

(b) Prove that A “spreads itself evenly” through its hull in the sense that for 
each measurable E we have m*(A D E) = im(Ha H E). 

(c) Prove the following version of the Lebesgue Density Theorem. For almost 
every p G Ha we have 


lim 

QIp 


m*(A n Q) 
mQ 



[Hint: Review the proof of the Lebesgue Density Theorem. Taking E — Q 
in (b) is useful in proving (c).] 

71. True or false: If Ha is a measurable hull of A then Ha \ A is a zero set. 

72. If TV is a doppelgnger of a measurable set E (Corollary 67 and Exercise 50) prove 
that E is a measurable hull of N. (Thus N is something like a “nonmeasurable 
kernel of EA) 

*73. Prove that the outer measure of the Cartesian product of sets which are not 
necessarily measurable is the product of their outer measures. [Hint: If Ha and 
Hb are hulls of A and B use the Zero Slice Theorem to show that their product 
is a hull of A x B.] 

*74. What about the inner measure of a product? 

75. Observe that under Cartesian products, measurable and nonmeasurable sets 
act like odd and even integers respectively. 

(a) Which theorem asserts that the product of measurable sets is measurable? 
(Odd times odd is odd.) 

(b) Is the product of nonmeasurable sets nonmeasurable? (Even times even is 
even.) 

(c) Is the product of a nonmeasurable set and a measurable set having nonzero 
measure always nonmeasurable? (Even times odd is even.) 

(d) Zero sets are special. They correspond to the number zero, an odd number 
in this imperfect analogy. (Zero times anything is zero.) 

*76. Exercise 3.18 asks you to prove that given a closed set L C R, there is a C°° 
function /3 : R — > [0, oo) whose zero locus {x : (3(x) — 0} equals L. Give it 
another try. Can you also do it in R n ? 

77. Suppose that F C [0,2] is a fat Cantor set of measure 1. Prove that there is 

a C°° homeomorphism h : R — > R that carries [0, 2] to [0, 1] and sends F to a 
Cantor set hF of measure zero. [Hint: Use a /3 from Exercise 76 and a constant 
c to define hx as c / 3{t ) dt. How does Exercise 3.34 help?] 

78. Suppose that / : R — > [0, oo) is Lebesgue measurable and g : [0, oo) — > [0, oo) is 
monotone or continuous. Prove that g o / is Lebesgue measurable. [Hint: Use 
the preimage definition of measurability and Exercise 45. 
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79. (a) For a bijection h verify that Xa — XhA ° h. 

(b) Let h : R — >> R be the smooth homeomorphism supplied by Exercise 77. 
Why does F contain a nonmeasurable set P and why is hP measurable? 

(c) Why is the nonmeasurable function Xp the composition XhP ° h. 

(d) Infer that a continuous function following a Lebesgue measurable function 
is Lebesgue measurable (Exercise 78) but a Lebesgue measurable function 
following a continuous (or even smooth) function may fail to be Lebesgue 
measurable. 

80. Let h : [0,2] — > [0,1] be the smooth homeomorphism supplied by Exercise 77 
and let P C F be nonmeasurable. Set f n (x) — 0 for all n, x. 

(a) Is it true that the functions f n are Borel measurable and converge almost 
everywhere to XhP ? 

(b) Is XhP Lebesgue measurable? 

(c) Is XhP Borel measurable? 

(d) Infer that if a sequence of Borel measurable functions converges almost 
everywhere to a limit function then that limit function may fail to be Borel 
measurable. 

81. Improve the Average Value Theorem to assert that not only is it true that for 
almost every p the average fg f dm -A f(p) as Q f p, but actually for almost 
every p we have 

lim 4- If — fp\ dm — 0. 

Qip Jq 

[Hint: Apply the Average Value Theorem to each of the countably many func- 
tions |/ — r\ where r G Q.] 

**82. Use the Improved Average Value Theorem from Exercise 81 to give a second 
proof of Lusin’s Theorem that does not use countable bases or preimage mea- 
surability. 

83. Suppose that (/&) is a sequence of measurable functions that converge almost 
everywhere to / as k -A oo. 

(a) Formulate and prove Egoroff’s Theorem if the functions are defined on a 
box in n-space. 

(b) Is Egoroff’s Theorem true or false for a sequence of functions defined on 
an an unbounded set having finite measure? 

(c) Give an example of a sequence of functions defined on R for which Egoroff’s 
Theorem fails. 

(d) Prove that if the functions are defined on R n and e > 0 is given then there 
is an e-set S C R n such that for each compact K C M n , the sequence of 
functions restricted to K n S c converges uniformly. 
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Why does Lusin’s Theorem imply that if / : B — > R is measurable and B C R n 
is bounded then / is nearly uniformly continuous? What if B is unbounded 
but has finite measure? 

Show that nearly uniform convergence is transitive in the following sense. As- 
sume that f n converges nearly uniformly to / as n — > oo, and that for each fixed 
n there is a sequence f n ^ which converges nearly uniformly to f n as k — > oo. 
(All the functions are measurable and defined on [ab\.) 

(a) Show that there is a sequence k(n) oo as n 4 oo such that f n u n ) 
converges nearly uniformly to / as n — >> oo. In symbols 

nulim nulim f nk = f => nulim f n u n) = /. 

n— ^ oo k — yoo n—t oo v ' 

(b) Why does (a) remain true when almost everywhere convergence replaces 
nearly uniform convergence? [Hint: The answer is one word. 

(c) Is (a) true when R replaces [a, b]7 

(d) Is (b) true when R replaces [a, 6]? 

86. Consider the continuous functions 


84. 

*85. 


fn,k( x ) 


(cos(7 mix)) 


k 


for fc, n G N and x G R. 

(a) Show that for each x G R. 


lim lim f nk (x) = Xq(x) 

oo k — yoo 


the characteristic function of the rationals. 

(b) Infer from Exercise 24 in Chapter 3 that there can not exist a sequence 
f n ,k(n) converging everywhere as n — > oo. 

(c) Interpret (b) to say that everywhere convergence can not replace almost 
everywhere convergence or nearly uniform convergence in Exercise 85. 

87. (a) Prove that the measure-theoretic boundary of a measurable set E is con- 

tained in its topological boundary, d m (E) C d E. 

(b) Construct an example of a continuous function / : [a, b] — > [0 ,M] such 
that <9(1 if) d m (l if). [Hint: A picture is worth a thousand formulas. 

88. Generalize Theorem 68 to functions of several variables. That is, prove that a 
bounded nonnegative function defined on a box in n-space is Riemann integrable 
if and only if the topological boundary of its undergraph is a zero set. 
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89. The T 1 -norm of the integrable function / : [a, b] — > R is ||/|| = f |/|. This gives 
a metric on the set £ of integrable functions [a, b] -> M as d L i(f,g) = \\f - g\\. 


** 


We say that f n ^9 lA-converges to g if || f n — g\\ 0. 

(a) Prove that £ is a complete metric space. 

(b) Prove that fR is dense in £ where fR is the set of Riemann integrable 
functions. 

(c) Infer that £ is the completion of fR with respect to the L 1 -metric. (This 
constructs Lebesgue integrals with minimal reference to Lebesgue mea- 
sure.) 

(d) What happens if we replace [a, b] with a box in R n ? 

90. A theory of integration more general than Lebesgue’s is due to Arnaud Den- 
joy. Rediscovered by Ralph Henstock and Jaroslav Kurzweil, it is described 
in Robert McLeod’s book, The Generalized Riemann Integral The definition 
is deceptively simple. Let / : [a, b] R be given. The Denjoy integral of 
/, if it exists, is a real number I such that for each e 0 there is a function 
5 : [a, b] — > (0, oo) and 


n 


^2f(tk)Ax k - I 


k = 1 


< e 


for all Riemann sums with A xj~ < 5(tk), k — 1, . . . ,n. (McLeod refers to the 
function 5 as a “gauge” and to the intermediate points t & as “tags”.) 

(a) Verify that if we require the gauge S(t) to be continuous then the Denjoy 
integral reduces to the Riemann integral. 

(b) Verify that the function 

f(x) = <| \[x 

100 if x = 0 


if 0 < x < 1 


has Denjoy integral 2. [Hint: Construct gauges 5(t) such that 5(0) > 0 

but lim 6(t) — 0.1 
t—> o+ 

(c) Generalize (b) to include all functions defined on [a, b] for which the im- 
proper Riemann integral is finite. 

(d) Infer from (c) and Exercise 46 that some functions are Denjoy integrable 
but not Lebesgue integrable. 

(e) Read McLeod’s book to verify that every nonnegative Denjoy integrable 
function is Lebesgue integrable and the integrals are equal; and every 
Lebesgue integrable function is Denjoy integrable and the integrals are 
equal. Infer that the difference between Lebesgue and Denjoy corresponds 
to the difference between absolutely and conditionally convergent series 
- if / is Lebesgue integrable, so is |/|, but this is not true for Denjoy 
integrals. 
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**91. Four types of convergence of a sequence of measurable functions (f n ) are: Al- 
most everywhere convergence, L 1 convergence, nearly uniform convergence, and 
convergence in measure. This last type of convergence requires that for each 
e > 0 we have 

m({x : | f n (x) - g(x)\ > e}) -> 0 


as n 4 oo. Consulting the tetrahedron in Figure 159, decide which oriented 
edges represent implications for sequences of functions defined on [a, 6], on R, 
or represent implications on neither [a, b] nor R. 


L 


1 



n.u. 


Figure 159 You might label an edge that represents implication only for 
functions defined on [a, b } with a single arrow, but use a double arrow if the 
implication holds for functions defined on R. For example, how should you 

label the edge from a.e. to n.u.? 


**92. Assume that the (unbalanced) density of E exists at every point of R, not merely 
at almost all of them. Prove that up to a zero set, E — R, or E — 0. (This is a 
kind of measure-theoretic connectedness. Topological connectedness of R 
is useful in the proof.) Is this also true in R n ? 

***93. [Speculative] Density seems to be a first-order concept. To say that the density 
of E at x is 1 means that the concentration of E in a ball B containing x tends 
to 1 as B l x. That is, 


mB 

But how fast can we hope it tends to 0? We could call x a double density 
point if the ratio still tends to 0 when we square the denominator. Interior 
points of E are double density points. Are such points common or scarce in a 
measurable set? What about balanced density points? What about fractional 
powers of the denominator? 


m{B) — m(E n B ) 


Suggested Reading 


There are many books on more advanced analysis and topology. Among my favorites 
in the “not too advanced” category are these. 

1. Kenneth Falconer, The Geometry of Fractal Sets. 

Here you should read about the Kakeya problem: How much area is needed 
to reverse the position of a unit needle in the plane by a continuous motion? 
Falconer also has a couple of later books on fractals that are good. 

2. Thomas Hawkins, Lebesgue’s Theory of Integration. 

You will learn a great deal about the history of Lebesgue integration and anal- 
ysis around the turn of the last century from this book, including the fact that 
many standard attributions are incorrect. For instance, the Cantor set should 
be called the Smith set; Vitali had many of the ideas credited solely to Lebesgue, 
etc. Hawkins’ book is a real gem. 

3. John Milnor, Topology from the Differentiable Viewpoint. 

Milnor is one of the clearest mathematics writers and thinkers of the twentieth 
century. This is his most elementary book, and it is only seventy-six pages long. 

4. James Munkres, Topology , a First Course. 

This is a first-year graduate text that deals with some of the same material you 
have been studying. 

5. Robert Devaney, An Introduction to Chaotic Dynamical Systems. 

This is the book you should read to begin studying mathematical dynamics. It 
is first rate. 

One thing you will observe about all these books - they use pictures to convey 
the mathematical ideas. Beware of books that don’t. 
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Analyticity Theorem, 250 

Antiderivative Theorem, 185, 431 

Antoine’s Necklace, 117 

arc, 131 

area of a rectangle, 384 
argument by contradiction, 8 
Arzela-Ascoli Propagation Theorem, 227 
Arzela-Ascoli Theorem, 224 
ascending Z-tuple, 333 
associativity, 14, 335 
average derivative, 289 
Average Integral Theorem, 426 
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Baire class 1, 201 

Baire’s Theorem, 256 

balanced density, 422, 458 

Banach Contraction Principle, 240 

Banach indicatrix, 460 

Banach space, 296 

basic form, 331 

Bernstein polynomial, 229 

bijection, 31 

bilinear, 287 

block test, 208 

Bolzano- Weierstrass Theorem, 80 

Borel measurability, 443 

Borel’s Lemma, 267 

boundary, 92, 141 

boundary of a fc-cell, 343 

bounded above, 13 

bounded function, 98, 261 

bounded linear transformation, 279 

bounded metric, 138 

bounded set, 97 

bounded variation, 438 

box, 26 

Brouwer Fixed-Point Theorem, 240, 353 
bump function, 200 

Cantor function, 186 

Cantor Partition Lemma, 113 

Cantor piece, 112 

Cantor set, 105 

Cantor space, 112 

Cantor Surjection Theorem, 108 

cardinality, 31 

Cauchy completion, 122 

Cauchy condition, 18, 77 

Cauchy Convergence Criterion, 19, 191 

Cauchy product, 210 

Cauchy sequence, 77 

Cauchy-Binet Formula, 339, 363 

Cauchy-Riemann Equations, 360 

Cauchy-Schwarz Inequality, 23 

Cavalieri’s Principle, 318, 414 


cell, 328 

center of a starlike set, 130 

chain connected, 131 

Chain Rule, 150, 285 

Change of Variables Formula, 319 

characteristic function, 171 

Chebyshev Lemma, 434 

class C r , 158, 295 

class C 00 , 295 

clopen, 67 

closed form, 347 

closed neighborhood, 94 

closed set, 66 

closed set condition, 72 

closure, 70, 92 

cluster point, 92, 140 

co- Cauchy, 119 

codomain, 30 

coherent labeling, 110 

common refinement, 168 

commutative diagram, 302 

compact, 79 

comparable norms, 366 

Comparison Test, 192 

complement, 45 

complete, 14, 78 

completed undergraph, 407 

Completion Theorem, 119 

complex analytic, 251 

complex derivative, 360 

composite, 31 

concentration, 422 

condensation point, 92, 140 

condition number, 361 

conditional convergence, 192, 464 

cone map, 349 

cone on a metric space, 139 

connected, 86 

connected component, 147 

conorm, 281, 366 

continuity in a metric space, 61 
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continuously differentiable, 157 
Continuum Hypothesis, 31, 137, 145 
contraction, weak contraction, 240, 266 
convergence, 18, 60, 191 
convex, 26 

convex combinations, 27, 49 
convex function, 49 
convex hull, 115 
countable, 31 

countable additivity, countable subaddi- 
tivity, 384 

countable additivity, subadditivity, 389 

countable base, 141, 405, 447 

counting measure, 450 

covering, 98 

covering compact, 98 

critical point, critical value, 204, 459 

cube, 26 

Cupcake Theorem, 145 
curl, 347 

Darboux continuous, 154 

Darboux integrable, Darboux integral, 167 

de Rham cohomology, 352 

De Morgan’s Law, 45 

Dedekind cut, 12 

Denjoy integral, 464 

dense, 107 

density point, 422 

denumerable, 31 

derivate, 434 

derivative, 149 

derivative (multivariable), 282 
derivative growth rate, 248 
determinant, 363 
Devil’s ski slope, 188, 456 
Devil’s staircase function, 186 
diagonalizable matrix, 368 
diameter in a metric space, 82 
diffeomorphism, 163, 300 
differentiability of order r, 157 
differentiable (multivariable), 282 


differentiable function, 149, 151 
differential 1-form, 327 
differential quotient, 149 
differentiation past the integral, 290 
dipole, 343 

directional derivative, 369 
disconnected, 86 

discontinuity of the first, second kind, 204 
discrete metric, 58 
disjoint, 2 

distance from a point to a set, 130 
distance function, 58 
divergence of a series, 191 
divergence of a vector field, 346 
division of a metric space, 109 
domain, 29 

Dominated Convergence Theorem, 409 

domination of one series by another, 192 

doppelganger, 442 

dot product, 22 

double density point, 465 

dyadic, 47 

dyadic ruler function, 204 

Egoroff’s Theorem, 448 
embedding, 85 
empty set, 2 
envelope sequences, 408 
equicontinuity, 224 

equivalence relation, equivalence class, 3 

Euler characteristic, 50 

Euler’s Product Formula, 210 

exact form, 347 

exponential growth rate, 194 

extension of a function, 129 

exterior derivative, 337 

fat Cantor set, 108, 203 
Fatou’s Lemma, 410 
field, 16 
finite, 31 

finite additivity, 390 
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finite intersection property, 134 
fixed-point, 47, 240 
flow, 246 
flux, 346 

Frechet derivative, 284 
Fubini’s Theorem, 316 
Fubini-Tonelli Theorem, 416 
function, 29 
function algebra, 234 
functional, 328 

Fundamental Theorem of Calculus, 183, 
426 

Fundamental Theorem of Continuous Func 
tions, 41 

gap interval, 108, 112 
Gauss Divergence Theorem, 346 
Generalized Heine-Borel Theorem, 103 
generic, 256 
geometric series, 191 
gradient, 311 
grand intersection, 134 
greatest lower bound, 47 
Green’s Formula, 346 
growing steeple, 214 

Holder condition, 198 
Hahn-Mazurkiewicz Theorem, 143 
Hairy Ball Theorem, 381 
harmonic series, 192 
Hausdorff metric, 144 
Hawaiian earring, 132 
Heine-Borel Theorem, 80, 81 
Heine-Borel Theorem in a Function Space 
228 


identity map, 31 

Identity Theorem for analytic functions, 
268 

image, 30 

implicit function, 297 
Implicit Function Theorem, 298 
improper Riemann integral, 191 
inclusion cell, 334 
indicator function, 171 
infimum, 17 
infinite, 31 

infinite address string, 107 
infinite product, 209 
infinitely differentiable, 157 
Inheritance Principle, 73, 74 
inherited metric, 58 
inherited topology, 74 
initial condition for an ODE, 242 
injection, 30 
inner measure, 384 

inner product, inner product space, 28 
integer lattice, 24 
Integral Test, 193 
integrally equivalent, 205 
integration by parts, 189 
integration by substitution, 189 
interior, 92, 140 

Intermediate Value Theorem, 40 
Intermediate Value Theorem for / 7 , 154 
intrinsic property, 85 
Inverse Function Theorem, 162, 301 
inverse image, 71 
, isometry, isometric, 126 
iterate, 138 


Higher Order Chain Rule, 374 
Higher Order Leibniz Rule, 199 
Hilbert cube, 143 
homeomorphism, 62 
hull, 400 
hyperspace, 144 


Jacobian, 319 

Jordan content, 319, 450, 451 
Jordan Curve Theorem, 144 
Jordan measurable, 451 
jump, jump discontinuity, 49, 204 


idempotent, 70 


kernel, 400 
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L’Hopital’s Rule, 153 
Lagrange form of the Taylor remainder, 
160 

Lagrange multiplier, 310 
least upper bound, 13 
Least Upper Bound Property, 14 
Lebesgue Density Theorem, 422 
Lebesgue Dominated Convergence Theo- 
rem, 409 

Lebesgue integrability, Lebesgue integral, 
406 

Lebesgue measurability, Lebesgue measure, 
389 

Lebesgue Monotone Convergence Theo- 
rem, 407 

Lebesgue number, 100 
Lebesgue outer measure, 383 
Lebesgue’s Antiderivative Theorem, 431 
Lebesgue’s Fundamental Theorem of Cal- 
culus, 426 

Lebesgue’s Main Theorem, 430, 439 
Leibniz Rule, 149, 285 
length of a vector, 23 
length of an interval, 383 
limit, 65 
limit point, 65 
limit set, 68 

linear transformation, 277 
Lipeomorphism, 452 
Lipschitz condition, 244 
locally path connected, 143 
locally path-connected, 132 
logarithm function, 186 
lower Lebesgue sum, 440 
lower sum, lower integral, 166 
Lusin’s Theorem, 447 

magnitude of a number, of a vector, 16, 
23 

Manhattan metric, 76 
map, mapping, 29 
maximum stretch, 279 


meager subset, 256 
mean value property, 151 
Mean Value Theorem, 151, 288 
measurability, measure, 389 
measurable function, 406 
Measurable Product Theorem, 401 
measurable with respect to an outer mea- 
sure, 389 

Measure Continuity Theorem, 392 
measure space, 397 
measure-theoretic connectedness, 465 
measure-theoretic interior, exterior, and 
boundary, 424 
Mertens’ Theorem, 210 
meseometry, 393, 397 
meseomorphism, 393, 397 
mesh of a partition, 164 
metric space, metric subspace, 57, 58 
middle-quarters Cantor set, 203 
middle-thirds Cantor set, 105 
minimum stretch, 366 
modulus of continuity, 264 
Monotone Convergence Theorem, 407 
monotonicity, 125 
Moore-Kline Theorem, 112 
Morse-Sard Theorem, 204 
multilinear functional, 352 

name of a form, 327 

natural numbers, 1 

nearly continuous, 447 

nearly uniform convergence, 448 

neighborhood, 70 

nested sequence, 81 

norm, normed space, 28, 279 

nowhere dense, 107 

ODE, 242 
one-to-one, 30 
onto, 30 

open covering, 98 
open mapping, 127 


476 


Index 


open set, 66 
open set condition, 72 
operator norm, 279 
orbit, 138, 441 
ordered field, 16 
orthant, 24 

oscillating discontinuity, 205 
oscillation, 177 
outer measure, 383 

parallelogram law, 53 
partial derivative, 284 
partial product, 209 
partial sum, 191 
partition, 113 
partition pair, 164 
patches, 99 

path, path-connected, 90 
Peano curve, 112 
Peano space, 143 
perfect, 94 

Picard’s Theorem, 244 

piece of a compact metric space, 109 

piecewise continuous function, 172 

Poincare Lemma, 348 

pointwise convergence, pointwise limit 

pointwise equicontinuity, 224, 261 

Polar Form Theorem, 362 

positive definiteness, 58 

preimage, 71 

preimage measurability, 416 
proper subset, 86 
pullback, pushforward, 338 

quasi-round, 449 

Rademacher’s Theorem, 206, 438 
Radius of Convergence Theorem, 197 
range, 30 

rank, Rank Theorem, 301, 303 
Ratio Mean Value Theorem, 152 
Ratio Test, 195 


rational cut, 13 
rational numbers, 2 
rational ruler function, 173 
real number, 12 

rearrangement of a sequence, 126 
rearrangement of a series, 209 
reduction of a covering, 98 
Refinement Principle, 168 
regularity hierarchy, 158 
regularity of Lebesgue measure, 399 
regularity sandwich, 399 
retraction, 353 
Riemann ^-function, 210 
Riemann integrability, Riemann integral, 
164 

Riemann measurable, 319 
Riemann sum, 164 

Riemann’s Integrability Criterion, 171 
Riemann-Lebesgue Theorem, 175 
Root Test, 194 

sample points, 164 
Sandwich Principle, 173 
Sard’s Theorem, 204 
satellite, 458 

2 ii sawtooth function, 254 

Schroeder-Bernstein Theorem, 36 
scraps, 99 

second derivative, 291 

separable metric space, 141 

separates points (function algebra), 234 

separation, 86 

shadow, 330 

shear matrix, 320, 368 

sign of a permutation, 363 

signed area, 330 

signed commutativity, 331 

simple closed curve, 144 

simple form, 331 

simple function, 456 

simple region, 377 

simply connected, 347 
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singleton set, 2 

slice, 316, 403, 414 

sliding secant method, 155 

slope over an interval, 434 

smooth, 157, 295 

solution of an ODE, 242 

somewhere dense, 107 

space- filling, 112 

spherical shell, 379 

staircase curve, 376 

starlike, 130, 351 

steeple functions, 214 

Steinhaus’ Theorem, 441 

step function, 172 

Stokes’ Curl Theorem, 347 

Stokes’ Formula for a Cube, 343 

Stokes’ Formula for a general cell, 345 

Stone- Weierstrass Theorem, 234 

subcovering, 98 

subheld, 16 

sublinear, 282 

subsequence, 60 

sup norm, 214 

support of a function, 200 

supremum, 17 

surjection, 30 

tail of a series, 192 
tame, 116 
target, 30 
taxicab metric, 76 

Taylor Approximation Theorem, 160 

Taylor polynomial, 159 

Taylor series, 161, 248 

Taylor’s Theorem, 251 

Term by Term Integration Theorem, 219 

thick and thin subsets, 256 

topological equivalence, 73 

topological property, 71 

topological space, 67 

topologist’s sine circle, 132 

topologist’s sine curve, 91 


total derivative, 284 

total length of a covering, 108, 175, 384 

total undergraph, 453 

total variation of a function, 438 

totally bounded, 103 

totally disconnected, 105 

trajectory of a vector held, 243 

transcendental number, 51 

transformation, 29 

Triangle Inequality, 16 

Triangle Inequality for distance, 24 

Triangle Inequality for vectors, 24 

trichotomy, 16 

trigonometric polynomial, 238 
truncation of an address, 107 

ultrametric, 136 
unbounded set, 97 
uncountable, 31 
undergraph, 164, 406 
uniform C r convergence, 295 
uniform continuity, 52, 85 
uniform convergence, 211, 217 
uniform equicontinuity, 261 
unit ball, sphere, 26 
unit cube, 26 

universal compact metric space, 108 
upper semicontinuity, 147, 275, 454 
upper sum, upper integral, 166 
utility problem, 144 

vanishing at a point (function algebra), 
234 

vector held, 243, 346 

vector ODE, 242 

Vitali covering, 418 

Vitali Covering Lemma, 418, 422 

wedge product, 334 

Weierstrass Approximation Theorem, 228 
Weierstrass M-test, 217 

wild, 117 
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Zeno’s staircase function, 174 
zero locus, 268, 461 
zero set, 108, 175, 315, 386 
Zero Slice Theorem, 403 
zeroth derivative, 157 


